cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2469
Views
8
Helpful
12
Replies

StackWise and StackWise Virtual Difference

Hc-angie-chen
Level 1
Level 1

Hello, 

I’m new to Cisco and currently studying StackWise and StackWise Virtual. Both technologies seem to reach the same goal for high availability and provide single management interface. I wonder why both exist, especially StackWise is supported in edge switches and StackWise Virtual is supported in distribution/core layer switches. 

The main difference I found is that StackWise supports up to 8 or 9 switches with limited distance, while StackWise Virtual supports 2 switches across fiber link. However, I wonder if there are specific capabilities that StackWise offers but StackWise Virtual cannot, or vice versa? Is the hardware limitation the main reason for the two technologies, or are there other software or feature differences I might be missing?

I actually found the Cisco documentation below, which mentions that StackWise provides Platform/System Resiliency, while StackWise Virtual focuses on Network/Operational Resiliency, but I don't understand from technical standpoint, what exactly the distinction is. Could any expert help clarify? 

https://www.ciscolive.com/c/dam/r/ciscolive/emea/docs/2023/pdf/BRKENS-2095.pdf

Any input is greatly appreciated. 

Thank you.

Best regards,

Angie

1 Accepted Solution

Accepted Solutions

@Hc-angie-chen , when you look at the diagram @MHM Cisco World provided, the way it's diagramed, there doesn't appear to be much difference between using StackWise vs. SVL; in this diagram, there's not.

The diagram, especially, shows core and distro pairs (?) for both StackWise and SVL.  In a single pair relationship, StackWise appears to offer about the same capabilities and capacity as SVL, but you need to consider the platform capabilities/capacity being used and whether we're doing L2 or L3.

With L3, such as between core and distro, SVL, does as mentioned in the diagram "Main goal is to simplify Distribution or Core layer".  Consider, L3 between devices can use multiple paths, all paths, and shortest path.  I.e., what does SVL actually offer compared to just using L3?  Actually, not much.  It halves "devices", but makes their configuration more complex.  Possibly it reduces reliability, as possibly SVL software, being more complex, may be more likely to have a bug.  It reduces some options, such as using different peer devices.  Frankly, for a pair of core devices, like in the diagram, we often decided to keep them independent L3 devices.  (BTW, when I described SVL operation, for its data path usage, it much mimics L3, as L3 peers don't generally send traffic sideways, unless it's the only way to reach the destination.)

StackWise or SVL, are very nice for L2.  It avoids all the issues that come with spanning tree (regarding using all links using Etherchannel between multiple physical switches because they are now just one logical switch).  However, since there's one logical device, logically, traffic might move sideways between peers.

Consider using 6513, with sup2Ts.  (I'm using old technology, as I'm very familiar with them.)  You have 11 line card slots, each providing 80 Gbps, duplex, which is non-blocking with the 2Tbps fabric.  If dual peers are logically one device, how to you interconnect them to also provide for 4 Tbps?  With SVL, the peers do not want to pass data between themselves, so if you design for that, it's a non-issue.  With StackWise, you try to provide the necessary bandwidth because the peers do not try to avoid using each other.

Again, with just two peers, for StackWise, other than an extra hop, if you unnecessarily use a peer, there may be enough ring bandwidth, that bandwidth is not oversubscribed (as it may be with a larger number of stack members).

However, traditionally, StackWise switches do not have the hardware resources of the SVL level of switches, i.e. they may have performance issues unrelated to the stack, itself.

Again, going back years ago, Cisco had the 3750G with 48 copper gig ports (plus uplinks) and the 4948 with 48 copper gig ports.  Hmm, both has 48 copper ports, so why what the 4948 so much more expensive?

Well the 3750G fabric was on 32 Gbps (enough for 16 gig ports) while the 4948 fabric was 96 Gbps.  So as many found out (peruse old forum posts), 3750Gs didn't do well in distro or core roles, or even server edge roles, if they were "busy".

Cisco then came out with the 3750E, which also had a non-blocking fabric, and PPS to support wire-rate for its 48g copper ports; much like the 4948.  So, it was not just as good as the 4948?  Better, as it supported stacking, right?  Well, no, because it didn't have the buffer capacity of the 4948. so burst oversubscription of edge ports would often lead to drops, not seen on a 4948.  Same problem on the 3750-X and the 3650 and 3850 too.  Again, if desired, peruse forum posts.  BTW, the 4948 wasn't stackable, but eventually, other members of the Catalyst 4K series, did obtain VSS.

To recap, both StackWise and SVL offer about the same redundancy and L2 features, but SVL somewhat mimics how L3 routing would operate for path selection.  Basically, it considers using its peer as an extra high cost hop, so it generally avoids it.  If you have lots of east-west peer traffic, VSS/SVL is worse than StackWise, yet even StackWise can run into peer<>peer(<>peer. . .) bandwidth issues.

If you have very high bandwidth requirements, between devices at the same level, your best performance option is using a single device that supports the bandwidth requirement and port density (reasons for existence of large physical devices - consider something like Cisco's CRS devices), or using a logical multi device fabric, Cisco's APIC/ACI, otherwise, you use a hierarchal topology, but care needs to be done supporting redundancy.

BTW, in another vendor's product portfolio, which I've used, they have stackable switches than can use high bandwidth local stacking, or "ordinary" links for non-local stacking, or both, concurrently.  I believe, they treat both, internally, much like a multi-switch fabric, supporting one logical switch.  Logically superior, I believe, to either StackWise or VSS/VSL, but a later technology development than Cisco's.  Further, especially with VSS/VSL, it was possibly designed for devices never designed for this.  So, not surprising it might be logically superior, however, also possibly, its devices didn't seem as operationally solid as Cisco's.  (Many of us mention issues with Cisco devices, but compared to many Brand X devices, Cisco devices often work as documented, and when they don't, Cisco actually provides updates so that they do.)

View solution in original post

12 Replies 12

@Hc-angie-chen 

 Stackwise (Without the virtual) is the traditional Stack we all know. It´s there for a long long time. 

 

Stackwise virtual is something new in comparison with traditional stackwise.  At the end of the day, they have the same function that is connect switches together creating one switch with double interface count.

 I dont believe you miss any feature by using one or other but they will have different purpose on the network. Stackwise you use for access switch in order to increate interface capacity. StackWise, in the past also called VSS, you will use to connect Core switches or distribution switches with redundancy objective, not for high density interfaces system. 

 

for access SW we need as much as we can port to connect to endpoints, the max port per SW is 48 so to get more we stackwise multi SW to get more port 

for agg and core layer SW we need high availability and redundancy of L3/L2 service and hence we need multi Supervisor and use stackwise virtual to connect these Sup (SW)

MHM

Hi MHM, 

Thanks a lot for your reply.

Since StackWise already implements an Active/Standby switch mechanism to achieve high availability with link, device, and power redundancy in failure scenarios, could you explain how Multi-Supervisor in StackWise Virtual provides additional benefits or contributes to an even “higher” level of availability or redundancy beyond what StackWise already offers? Or is there any document that I can refer to? 

Again, thank you for your insights!

Best regards,

Angie

Unsure it still applies, but I recall (?), in the past, VSS took less of a hit to transit traffic than StackWise.

Also, I recall (?), VSS eventually supported dual sups in their chassis, so each VSS member was less likely to fail.

Assuming my recollections are accurate, these features would usually be more important in a core or distribution role than an user access edge.  (For servers, well there's Nexus, which has its own redundancy architecture.  Interestingly, a Nexus pair still operate as individual devices.  [There's also ACI, but I don't what to digress further.])

@Hc-angie-chen , when you look at the diagram @MHM Cisco World provided, the way it's diagramed, there doesn't appear to be much difference between using StackWise vs. SVL; in this diagram, there's not.

The diagram, especially, shows core and distro pairs (?) for both StackWise and SVL.  In a single pair relationship, StackWise appears to offer about the same capabilities and capacity as SVL, but you need to consider the platform capabilities/capacity being used and whether we're doing L2 or L3.

With L3, such as between core and distro, SVL, does as mentioned in the diagram "Main goal is to simplify Distribution or Core layer".  Consider, L3 between devices can use multiple paths, all paths, and shortest path.  I.e., what does SVL actually offer compared to just using L3?  Actually, not much.  It halves "devices", but makes their configuration more complex.  Possibly it reduces reliability, as possibly SVL software, being more complex, may be more likely to have a bug.  It reduces some options, such as using different peer devices.  Frankly, for a pair of core devices, like in the diagram, we often decided to keep them independent L3 devices.  (BTW, when I described SVL operation, for its data path usage, it much mimics L3, as L3 peers don't generally send traffic sideways, unless it's the only way to reach the destination.)

StackWise or SVL, are very nice for L2.  It avoids all the issues that come with spanning tree (regarding using all links using Etherchannel between multiple physical switches because they are now just one logical switch).  However, since there's one logical device, logically, traffic might move sideways between peers.

Consider using 6513, with sup2Ts.  (I'm using old technology, as I'm very familiar with them.)  You have 11 line card slots, each providing 80 Gbps, duplex, which is non-blocking with the 2Tbps fabric.  If dual peers are logically one device, how to you interconnect them to also provide for 4 Tbps?  With SVL, the peers do not want to pass data between themselves, so if you design for that, it's a non-issue.  With StackWise, you try to provide the necessary bandwidth because the peers do not try to avoid using each other.

Again, with just two peers, for StackWise, other than an extra hop, if you unnecessarily use a peer, there may be enough ring bandwidth, that bandwidth is not oversubscribed (as it may be with a larger number of stack members).

However, traditionally, StackWise switches do not have the hardware resources of the SVL level of switches, i.e. they may have performance issues unrelated to the stack, itself.

Again, going back years ago, Cisco had the 3750G with 48 copper gig ports (plus uplinks) and the 4948 with 48 copper gig ports.  Hmm, both has 48 copper ports, so why what the 4948 so much more expensive?

Well the 3750G fabric was on 32 Gbps (enough for 16 gig ports) while the 4948 fabric was 96 Gbps.  So as many found out (peruse old forum posts), 3750Gs didn't do well in distro or core roles, or even server edge roles, if they were "busy".

Cisco then came out with the 3750E, which also had a non-blocking fabric, and PPS to support wire-rate for its 48g copper ports; much like the 4948.  So, it was not just as good as the 4948?  Better, as it supported stacking, right?  Well, no, because it didn't have the buffer capacity of the 4948. so burst oversubscription of edge ports would often lead to drops, not seen on a 4948.  Same problem on the 3750-X and the 3650 and 3850 too.  Again, if desired, peruse forum posts.  BTW, the 4948 wasn't stackable, but eventually, other members of the Catalyst 4K series, did obtain VSS.

To recap, both StackWise and SVL offer about the same redundancy and L2 features, but SVL somewhat mimics how L3 routing would operate for path selection.  Basically, it considers using its peer as an extra high cost hop, so it generally avoids it.  If you have lots of east-west peer traffic, VSS/SVL is worse than StackWise, yet even StackWise can run into peer<>peer(<>peer. . .) bandwidth issues.

If you have very high bandwidth requirements, between devices at the same level, your best performance option is using a single device that supports the bandwidth requirement and port density (reasons for existence of large physical devices - consider something like Cisco's CRS devices), or using a logical multi device fabric, Cisco's APIC/ACI, otherwise, you use a hierarchal topology, but care needs to be done supporting redundancy.

BTW, in another vendor's product portfolio, which I've used, they have stackable switches than can use high bandwidth local stacking, or "ordinary" links for non-local stacking, or both, concurrently.  I believe, they treat both, internally, much like a multi-switch fabric, supporting one logical switch.  Logically superior, I believe, to either StackWise or VSS/VSL, but a later technology development than Cisco's.  Further, especially with VSS/VSL, it was possibly designed for devices never designed for this.  So, not surprising it might be logically superior, however, also possibly, its devices didn't seem as operationally solid as Cisco's.  (Many of us mention issues with Cisco devices, but compared to many Brand X devices, Cisco devices often work as documented, and when they don't, Cisco actually provides updates so that they do.)

Joseph W. Doherty
Hall of Fame
Hall of Fame

The big difference between the two StackWise architectures is performance/capacity.

Original StackWise, introduced with the 3750, is a much "nicer" way of using a set of 24 or 48 port cabinet switches cascaded into a ring topology.  From a performance standpoint, the original StackWise wasn't very efficient.  The original version placed ALL traffic on the stack ring, even for frames between ports on the same physical switch member.  This traffic circulated around the ring and was removed by the switch member sourcing it.

(IMO, a somewhat surprisingly approach for a switch.  The only offset was the bandwidth of the stack ring ports, [16 Gbps, duplex; advertised as 32 Gbps], which considering the original 3750 had FE copper ports and a 32 Gbps fabric, was "reasonable", I guess.  The later StackWise+ variants, increased bandwidth and are "smarter" about how traffic is placed/removed from the ring, at least for unicast traffic.  Still, the ring architecture is not ideal for high performance.)

The later StackWise Virtual (originally introduced as VSS on the sup720C or sup720-VSS) is designed to double performance/capacity.  Basically, two like switches, originally, the larger chassis models, are designed to be used in parallel, each processing half the traffic.

In this latter architecture, ideally, all connections to the pair are dual connections, often Etherchannel.  Ideally, there is NO data traffic between the pair.  However, the pair will pass data traffic "sideways", but only if there's no other path.

In some respects, this architecture works like the Nexus vPC architecture, especially for performance/capacity.  I recall (?) Nexus vPC predates the Catalyst VSS, and if so, why have both?  Multiple reasons come to my mind, but perhaps considering the differences between Nexus and Catalyst, Cisco perceived desire for StackWise features and/or Nexus performance/capacity on the larger Catalyst switches, and thought to meet such a desired.  (Cisco, BTW, continued to expand Catalyst to somewhat further mimic Nexus FEXs using Catalyst IAs, but it wasn't on the market very long.)

Could Cisco extend StackWise Virtual to use more than two devices?  I'm sure they could, but since it can be used on large chassis switches (and considering Catalyst IA didn't seem to do well), probably Cisco doesn't see the value to do so.

So, for access/edge devices, classical StackWise serves well.  BTW, in various situations, StackWise can decrease performance/capacity, it might also increase it, but it would need to be used carefully.

For core or distribution roles, VSS (now known as StackWise Virtual), provides doubles the performance/capacity.

Hi Joseph, 

Thank you so much for your input—I really appreciate the detailed explanation and background on Cisco stacking technologies.

Referring to your explanation:

"Ideally, all connections to the pair are dual connections, often EtherChannel. Ideally, there is NO data traffic between the pair. However, the pair will pass data traffic 'sideways,' but only if there's no other path."

If I replace StackWise Virtual Switch with StackWise Model, I could still set up a dual connection using MEC, so the main difference lies in the data traffic between the stacked switches. If StackWise Virtual has no data traffic between the pair under ideal conditions, I agree its performance would presumably surpass that of StackWise. 

However, I’m trying to understand how StackWise Virtual achieves NO data traffic between switches in practice. From what I studied in the Whitepaper, if there is a multicast traffic, the traffic still goes through the link interconnected to the peer switch in StackWise Virtual (which is the same behavior as StackWise). Similarly, wouldn’t unicast traffic behave in the same way given that both technologies utilize SSO and NSF? In other words, if the route locates in the peer switch, the traffic should still travel through the inter-connected link, or is there any other mechanism prevent this from happening, so we can say that StackWise Virtual reach a better performance? 

Again, tons of appreciation for spending your time reading this and answering my question. 

Best regards,

Angie

VSS/StackWise-Virtual has affinity for using its own member ports for egress 

For example, when using Etherchannel, each member device does not hash flows across all the ports, it only hashes flows across its ports, unless it has no ports.

It also does similar for L3, I recall not only for ECMP, but even when peer has a better path.

Basically peer is not considered for egress unless it's the only L2/L3 path.

Why is this done?  Consider all the slot bandwidth on a chassis and say you want to send half of it sideways via the peer.

Regarding multicast, as far as I know, it's the same for L2.  For L3, multicast isn't multi path.

Again, the peer data link doesn't refuse data traffic, it should only be used when the peer doesn't have an egress interface, physically or (multicast) logically.

Also again, why not use the peer link routinely?  Firstly, it usually doesn't provide sufficient bandwidth for what a chassis might direct to it.  Secondly, it would add another hop to the path, adding latency.

Classic StackWise attempts to deal with the cross member bandwidth issue by having the ring provide lots of bandwidth, which depending on the number and model of member switches might not be oversubscribed, but it adds latency, and likely doesn't try to minimize member hops.

BTW, could you provide a reference to the multicast using the peer path, routinely?  Possibly that might be in reference to L3 as multicast is not multi path.

Hi Joseph, 

Thanks again for the clear explanation. Regarding Multicast using peer path, I was actually referring to the data flow for L2 multicast, and thanks for your response, now I understand that two technologies don't have much difference in L2 perspective (which is the reason why I am confused in the beginning). 

(White paper: https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-9000/nb-06-cat-9k-stack-wp-cte-en.html " ) 

Quote "For traffic that must be flooded on the VLAN (broadcasts, multicasts, and unknown unicasts), a copy is sent across the StackWise Virtual link to be sent out to any single-homed ports belonging to the VLAN." Unquote

After reading all the explanation, below is my current understanding, and I hope it is aligned with your understanding. 

The primary difference between StackWise and StackWise Virtual in software perspective lies in their handling of Layer 3 traffic. StackWise uses source stripping for multicast traffic, while StackWise Virtual relies on IGMP/PIM protocols and exchanges multicast forwarding information via MFIB. (I believe StackWise Virtual can use IGMP to eliminate L2 Multicast traveling between the peers as well, and it seems that StackWise cannot run IGMP between Active Switch and Member Switch to reduce Multicast Traffic running on the backplane.)

For unicast traffic, since StackWise treats all member ports as if they belong to the same switch, if the destination route is accessible through a port on another switch within the stack, the traffic will preferentially traverse the backplane to reach that port. In contrast, StackWise Virtual may (?) assign a higher metric to the peer connection, making the peer path preferrable only when no other Layer 2 or Layer 3 paths are available.

I'd check if I can find some exact example or reference online, but at least I have a clearer picture and know why. Thanks a lot for your input.

Best regards,

Angie 

Regarding L2 multicast, and VSS/SVL, remember, when using that technology, all downstream connections should have connections to both peers, so for downstream, hopefully there will be no need for such a stream to go sideways, but even if it (or broadcast) does, such is replicated and usually is not a huge bandwidth consumer across the peer link.

With the two technologies, in a classical 3 tier network, where core and distro use L3 and distro and access use L2, and L3 doesn't much benefit, you may not have either on core, SVL on distro and StackWise an access.  The prior is a rough generalization.  The two technologies can often be interchanged, although often platform limited.

Deciding between the two often goes hand in hand with platform selection, or whether to use them at all.  For example, at the access edge, do you use a stack or a chassis, as the latter often provides much redundancy options?  Or, if the edge supports L3, does distro, or collapsed core, need either?

Often, the most important considerations, when designing networks, is budget.  That consideration may be a big factor in choosing either technology, or being able to choose either at all.

Leo Laohoo
Hall of Fame
Hall of Fame

Stackwise Virtual is just a marketing term and it very much the same with VSS.  VSS is plain "classic" IOS while Stackwise Virtual is IOS-XE.