cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3313
Views
10
Helpful
14
Replies

VSS force-switchover failed

vokabakov
Level 1
Level 1

Hi all,

I need to move currently active chassis (switch 1) of our VSS so I wanted to force-switchover to swap their roles. Unfortunatelly after "redundancy force-switchover" the former active chassis did not come up properly. I saw it present in sh switch virtual as a Standby and the former Standby was Active, but I did not see any modules ( show modules switch 1 was empty ) and all lines (sh ip int b) of the switch 1 were down.

At switch 2 I saw that its VSL control link Te2/5/4 were in UP/DOWN state.

After reload of all VSS it came back to the original configuration (switch 1 active, switch 2 standby) and everything works properly.

 

Could it be caused by higher priority of Switch 1?  ( switch 1 priority 110 ) If not, what could cause this issue?

Richard

 

2 Accepted Solutions

Accepted Solutions

the fact it didn't generate crash file may mean was just in some hung state process jammed etc , now its rebooted probably of wiped/reset the issue most likely so finding the actual problem may be very difficult unless you can replicate the issue and debug it while it takes place capture everything

You could check with TAC or check the release guides for the version your running see if any known bugs match what you seen , running a clean show tech now may not provide anything after the reboot

View solution in original post

Hi Mark,
today I tried again force-switchover and it was successful :-). There was probably really just some hung state process jammed etc. as you mentioned yesterday. Anyway thank you very much for your hints and support :-)
Richard

View solution in original post

14 Replies 14

Leo Laohoo
Hall of Fame
Hall of Fame

I suspect it's gone into ROMmon probably due to not the same IOS versions.

Hi Leo,
thank you for your response.
I dont think it is this kind of issue. Both switches are in VSS currently and they are working without any problem. I expect that if one of them has different IOS they would not be able to form virtual switch at all. Right now there are switch 1 as active and switch 2 as standby and I would like to just swap them, because I need to move switch 1 to different location on Monday and I would like to minimize impact to our users.
Richard

is the ios loaded in bootflash and slave bootflash on the switch that failed ?

Yes, it is.

boot system flash sup-bootdisk:/s72033-ipservicesk9_wan-mz.122-33.SXJ6.bin

 

RTR#sh sup-bootflash:
-#- --length-- -----date/time------ path
1 175267636 Apr 9 2011 09:26:08 +02:00 s72033-ipservicesk9_wan-vz.122-33.SXI5.bin
2 33554432 Apr 9 2011 09:20:32 +02:00 sea_console.dat
3 33554432 Apr 9 2011 12:07:56 +02:00 sea_log.dat
4 140062020 Oct 21 2015 20:08:12 +02:00 s72033-ipservicesk9_wan-mz.122-33.SXJ6.bin
5 1769720 Apr 26 2018 12:04:18 +02:00 tftp

640319488 bytes available (384237568 bytes used)

 

RTR#sh slavesup-bootflash:
-#- --length-- -----date/time------ path
1 175267636 Apr 9 2011 09:26:06 +02:00 s72033-ipservicesk9_wan-vz.122-33.SXI5.bin
2 33554432 Apr 9 2011 09:20:26 +02:00 sea_console.dat
3 33554432 Apr 9 2011 12:15:46 +02:00 sea_log.dat
4 19907 Apr 3 2013 10:03:46 +02:00 Vc_Qos_Tests_2413
5 19069 Dec 23 2013 15:55:52 +01:00 Before_Sense_Chg_231213
6 140062020 Oct 21 2015 19:59:56 +02:00 s72033-ipservicesk9_wan-mz.122-33.SXJ6.bin

642039808 bytes available (382517248 bytes used)

 

RTR#sh sup-bootdisk:
-#- --length-- -----date/time------ path
1 175267636 Apr 9 2011 09:26:08 +02:00 s72033-ipservicesk9_wan-vz.122-33.SXI5.bin
2 33554432 Apr 9 2011 09:20:32 +02:00 sea_console.dat
3 33554432 Apr 9 2011 12:07:56 +02:00 sea_log.dat
4 140062020 Oct 21 2015 20:08:12 +02:00 s72033-ipservicesk9_wan-mz.122-33.SXJ6.bin
5 1769720 Apr 26 2018 12:04:18 +02:00 tftp

640319488 bytes available (384237568 bytes used)

RTR#sh slavesup-bootdisk:
-#- --length-- -----date/time------ path
1 175267636 Apr 9 2011 09:26:06 +02:00 s72033-ipservicesk9_wan-vz.122-33.SXI5.bin
2 33554432 Apr 9 2011 09:20:26 +02:00 sea_console.dat
3 33554432 Apr 9 2011 12:15:46 +02:00 sea_log.dat
4 19907 Apr 3 2013 10:03:46 +02:00 Vc_Qos_Tests_2413
5 19069 Dec 23 2013 15:55:52 +01:00 Before_Sense_Chg_231213
6 140062020 Oct 21 2015 19:59:56 +02:00 s72033-ipservicesk9_wan-mz.122-33.SXJ6.bin

642039808 bytes available (382517248 bytes used)

any crash files when you run ... dir all-filesystems
sometimes located in slavecrashinfo you may see a crash file at same time you had your issue

nope. There is no crashfile anywhere :-/

 

It is quite strange, because when I issued force-switchover Switch 2 become active without any issue and after a while I was able to see also that Switch 1 booted up and become Standby as I needed. So VSL had to communicate with both chassis and see Switch 1 up, but there was some issue on a protocol layer. VSL interfaces on a Switch 2 had Status UP, but Protocol Down. 

When I forced reload of whole VSS, both chassis rebooted and came back to the original state - Switch 1 was Active and Switch 2 was Standby and everything has started to work again :-/

the fact it didn't generate crash file may mean was just in some hung state process jammed etc , now its rebooted probably of wiped/reset the issue most likely so finding the actual problem may be very difficult unless you can replicate the issue and debug it while it takes place capture everything

You could check with TAC or check the release guides for the version your running see if any known bugs match what you seen , running a clean show tech now may not provide anything after the reboot

i will try it again tomorrow during a day and I will see. Maybe it was caused just by some strange circumstances. I also tried reload only the Switch 1 by "redundancy reload shelf 1" but it did not help. Maybe something was stuck on a Switch 2 which became Active.

if your able to replicate , set the terminal to record everything from the start
Has this worked previously for you or have you never initiated a failover this way before , if its untested there may be something in the config causing/missing , maybe worth checking everything again

Make sure nothing looks off with the outputs from these commands , just some things id check to be sure its not on your end
show redundancy and sh vsl 1 lmp status / sh vsl 2 lmp status / sh switch virtual ro

if it happens again and the config is all good move off the version and retest if no TAC support


Thank you Mark.

I just tried commands you posted and maybe there could be an issue. If i ran "sh vsl 1 lmp status" I got it is working properly with no failure, but for 2 it returned emply result :-/. I guess it is not correct result, right?

 

RTR#sh vsl 1 lmp status

LMP Status

Last operational Current packet Last Diag Time since
Interface Failure state State Result Last Diag
-------------------------------------------------------------------------------
Te1/5/4 No failure Hello bidir Never ran --
Te1/5/5 No failure Hello bidir Never ran --

 

RTR#sh vsl 2 lmp status

RTR#

 

RTR#sh switch virtual ro

Switch Switch Status Priority Role Session ID
Number Oper(Conf) Local Remote
------------------------------------------------------------------
LOCAL 1 UP 110(110) ACTIVE 0 0
REMOTE 2 UP 100(100) STANDBY 6034 3756


In dual-active recovery mode: No

 

RTR#sh redundancy
Redundant System Information :
------------------------------
Available system uptime = 15 hours, 0 minutes
Switchovers system experienced = 0
Standby failures = 0
Last switchover reason = none

Hardware Mode = Duplex
Configured Redundancy Mode = sso
Operating Redundancy Mode = sso
Maintenance Mode = Disabled
Communications = Up

Current Processor Information :
-------------------------------
Active Location = slot 1/5
Current Software state = ACTIVE
Uptime in current state = 14 hours, 59 minutes
Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-IPSERVICESK9_WAN-M), Version 12.2(33)SXJ6, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Fri 19-Jul-13 03:30 by prod_rel_team
BOOT = sup-bootdisk:/s72033-ipservicesk9_wan-mz.122-33.SXJ6.bin,1;
Configuration register = 0x2102

Peer Processor Information :
----------------------------
Standby Location = slot 2/5
Current Software state = STANDBY HOT
Uptime in current state = 14 hours, 55 minutes
Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-IPSERVICESK9_WAN-M), Version 12.2(33)SXJ6, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Fri 19-Jul-13 03:30 by prod_rel_team
BOOT = sup-bootdisk:/s72033-ipservicesk9_wan-mz.122-33.SXJ6.bin,1;
Configuration register = 0x2102

Yes that doesnt look right should be like below from 1 of my VSS you should see both  , check the config on the VSL ports are on the second switch correctly , maybe your using ports 2/5/4 & 2/5/5

 

 


Executing the command on VSS member switch role = VSS Active, id = 1



  LMP Status

          Last operational        Current packet          Last Diag   Time since
Interface Failure state           State                   Result      Last Diag
-------------------------------------------------------------------------------
Te1/3/7   No failure              Hello bidir             Never ran   --
Te1/3/8   No failure              Hello bidir             Never ran   --


Executing the command on VSS member switch role = VSS Standby, id = 2


1#sh vsl 2 lmp status

Executing the command on VSS member switch role = VSS Active, id = 1



Executing the command on VSS member switch role = VSS Standby, id = 2



  LMP Status

          Last operational        Current packet          Last Diag   Time since
Interface Failure state           State                   Result      Last Diag
-------------------------------------------------------------------------------
Te2/3/7   No failure              Hello bidir             Never ran   --
Te2/3/8   No failure              Hello bidir             Never ran   --

 

 

 

 

##################################################

 

interface TenGigabitEthernet1/3/7
 description Peer VSL Do Not Move
 switchport mode trunk
 switchport nonegotiate
 no lldp transmit
 no lldp receive
 no cdp enable
 channel-group 100 mode on
 service-policy output VSL-Queuing-Policy
end

xir-b101uas01#sh run int Te2/3/7
Building configuration...

Current configuration : 241 bytes
!
interface TenGigabitEthernet2/3/7
 description Peer VSL Do Not Move
 switchport mode trunk
 switchport nonegotiate
 no lldp transmit
 no lldp receive
 no cdp enable
 channel-group 101 mode on
 service-policy output VSL-Queuing-Policy

my fault ... i had to run it directly on standby switch and not from the active one

RTR-sdby-sp#sh vslp lmp status

Instance #2:
LMP Status

Last operational Current packet Last Diag Time since
Interface Failure state State Result Last Diag
-------------------------------------------------------------------------------
Te2/5/4 No failure Hello bidir Never ran --
Te2/5/5 No failure Hello bidir Never ran --

But if you ran a force switch failover i would expect to see this below , last time i did this on my 6509s this is the output but then you rebooted the chassis after which is probably why it doesn't show it

i think replicate and try and capture as much as you can as the issue occurs , if it doesn't occur again it may be a once off process issue

sh vslp lmp status

Instance #1:


LMP Status

Last operational Current packet Last Diag Time since
Interface Failure state State Result Last Diag
-------------------------------------------------------------------------------
Te1/5/4 Dis:Peer Reload Request Hello bidir Never ran --
Te1/5/5 Dis:Peer Reload Request Hello bidir Never ran --



Hi Mark,
today I tried again force-switchover and it was successful :-). There was probably really just some hung state process jammed etc. as you mentioned yesterday. Anyway thank you very much for your hints and support :-)
Richard
Review Cisco Networking for a $25 gift card