cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3043
Views
15
Helpful
7
Replies

VSS split brain after reload

Hi guys,

After executing "reload" on our primary VSS, the VSS went into dual active when the primary switch was running again.

We have 2 x 10gig VSL links and no dual-active detection link.

Is this normal behaviour, since a reload would simulate a total power failure and after power restore operations should continue as before?

Regards,

Michael 

7 Replies 7

Reza Sharifi
Hall of Fame
Hall of Fame

Hi,

In order for the VSS to behave correctly, you need to have dual active detection configured.  You can do it using fast-hello or the a Portchannel connecting from an access port.  Fast-hello is easier if you have an extra 1gig ports on each switch.

HTH

When reading the configuration guide for VSS, I understand that the dual-active link is only used if both of my VSL links die? Isn't that correct, or am i reading it wrong?

I see no difference between a reload of the primary, or a fresh boot. According to documentation, there are no preempt and the priority is only used when both chassis boot.

When reading the configuration guide for VSS, I understand that the dual-active link is only used if both of my VSL links die? Isn't that correct, or am i reading it wrong?

That is correct but you still need to configure some sort of Dual-active detection.

Dual-Active fast-hello employs fast-hello Layer 2 messages over a direct Ethernet connection. When the VSL goes down, the event is communicated to the peer switch. If the switch was operating as the active before the VSL went down, it goes into recovery mode upon receipt of a VSL down indication from the peer switch. This method is faster than IP BFD and ePAGP and does not require a neighboring switch.

http://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst4500/XE3-7-0E/15-23E/configuration/guide/xe-370-configuration/vss.html#pgfId-1063718

HTH

In two weeks we have planned maintanance during the night time, then we'll test different scenarios on the core and VSS config. When we have the result, i'll post them.

Follow up on previous post...

Our Cat4506E Sup8 was running v3.6.3, and with this image, the VSS fails after one of the switches reloads. If we power cycled the active switch, both switches would become active afterwards.

Network traffic would become random, and we noticed that the power cycled switch would be in standby state, but network symptoms suggested that we had a dual active scenario.

(The dual active scenario seems most likely, but were not ruling out any other bugs!!)

Sh swi vir redun ->

Sh swi vir ->

Sh redundancy ->

<- On the active shows that the power cycled switch was in standby

With the new image 3.6.5, the Active switch would resume standby state after power cycle. This was not the case before.

This new behavior confirms Ciscos documentation on VSS and failover, that both switch does not become online in dual active after a complete shutdown.

Hello

Network traffic would become random, and we noticed that the power cycled switch would be in standby state, but network symptoms suggested that we had a dual active scenario.

Hope this helps -  It does sound like a dual active scenario - I have seen this a few times recently when TS new sites after a VSS introduction

Below is what I have used in the passed to understand and explain VSS -  CCO has lots urls with excellent  information on this and below is an extract from one of them .



The VSS standby switch monitors the VSS active switch using the VSL. If it detects failure, the VSS standby switch initiates a switchover and takes on the VSS active role. When the failed switch recovers,it takes on the VSS standby role.

If either the VSS active switch fails or all links that belong to the VSL port-channel fail, the VSS standby switch initiates a switchover and assumes the role of the VSS active switch.

If the previous VSS active switch has failed, it reloads and boots as the VSS standby switch. However, if only the VSL port-channel
failure caused the switchover, the previous VSS active switch enters recovery mode (provided
dual-active detection is configured).

In this scenario, the previous VSS active chassis (now in recovery
mode) carries no traffic and only monitors the VSL link.

When one link in the VSL port-channel is up,
the recovery mode switch reloads and boots as a VSS standby chassis.


Dual-Active Detection
If the VSL fails, the VSS standby switch cannot determine the state of the VSS active switch. To ensure
that switchover occurs without delay, the VSS standby switch assumes the VSS active switch has failed
and initiates switchover to take over the VSS active role.

If the original VSS active switch is still operational, both switch are now VSS active.

This situation is called a dual-active scenario. A dual-active scenario can have adverse effects on network stability, because both switches use the same IP addresses.



Fail over functionalities
A potential issue with losing connectivity of the VSL link ( this is the aggregate link attaching both switches that make the vss core)

If for some reason this link is lost between both switches , a situation will arise that both switches will think  they are active at the same time ( this is called dual active scenario)

If this occurs then inter-vlan routing will be disrupted and multiple errors will be reported such as duplicate ip addressing, stp issues. etc, So to negate this Dual active Detection can be implemented  3 ways

Note: all 3  fail over functionalities will show up as enabled in VSS even when not configured, --- Which i guess is a bit misleading

Enhanced PAgP  ( require MECs between access switches to be PAgp aware) - will require re-configuration is the interconnects between the access switches dot not run Pagp

IP BFD -Bi-directional Forwarding Detection- -  requires direct L3 peering between vss switches but doesn't rely on PAgp on MECs

Dual-active fast-hello method - requires direct L2 peering between vss switches but doesn't rely on PAGP on MECs

Enhanced PAgP
Dual-active fast-hello
Both of these are faster in failover then IP BFD

res
Paul


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

Great Overview of the scenarios Paul. Helped me a lot as I am doing some VSS testing. Old post I know but credit due. Cheers
Review Cisco Networking products for a $25 gift card