Solved: StackWise and StackWise Virtual Difference

Hc-angie-chen · ‎11-13-2024

Hello,

I’m new to Cisco and currently studying StackWise and StackWise Virtual. Both technologies seem to reach the same goal for high availability and provide single management interface. I wonder why both exist, especially StackWise is supported in edge switches and StackWise Virtual is supported in distribution/core layer switches.

The main difference I found is that StackWise supports up to 8 or 9 switches with limited distance, while StackWise Virtual supports 2 switches across fiber link. However, I wonder if there are specific capabilities that StackWise offers but StackWise Virtual cannot, or vice versa? Is the hardware limitation the main reason for the two technologies, or are there other software or feature differences I might be missing?

I actually found the Cisco documentation below, which mentions that StackWise provides Platform/System Resiliency, while StackWise Virtual focuses on Network/Operational Resiliency, but I don't understand from technical standpoint, what exactly the distinction is. Could any expert help clarify?

https://www.ciscolive.com/c/dam/r/ciscolive/emea/docs/2023/pdf/BRKENS-2095.pdf

Any input is greatly appreciated.

Thank you.

Best regards,

Angie

Joseph W. Doherty · ‎11-15-2024

@Hc-angie-chen , when you look at the diagram @MHM Cisco World provided, the way it's diagramed, there doesn't appear to be much difference between using StackWise vs. SVL; in this diagram, there's not.

The diagram, especially, shows core and distro pairs (?) for both StackWise and SVL. In a single pair relationship, StackWise appears to offer about the same capabilities and capacity as SVL, but you need to consider the platform capabilities/capacity being used and whether we're doing L2 or L3.

With L3, such as between core and distro, SVL, does as mentioned in the diagram "Main goal is to simplify Distribution or Core layer". Consider, L3 between devices can use multiple paths, all paths, and shortest path. I.e., what does SVL actually offer compared to just using L3? Actually, not much. It halves "devices", but makes their configuration more complex. Possibly it reduces reliability, as possibly SVL software, being more complex, may be more likely to have a bug. It reduces some options, such as using different peer devices. Frankly, for a pair of core devices, like in the diagram, we often decided to keep them independent L3 devices. (BTW, when I described SVL operation, for its data path usage, it much mimics L3, as L3 peers don't generally send traffic sideways, unless it's the only way to reach the destination.)

StackWise or SVL, are very nice for L2. It avoids all the issues that come with spanning tree (regarding using all links using Etherchannel between multiple physical switches because they are now just one logical switch). However, since there's one logical device, logically, traffic might move sideways between peers.

Consider using 6513, with sup2Ts. (I'm using old technology, as I'm very familiar with them.) You have 11 line card slots, each providing 80 Gbps, duplex, which is non-blocking with the 2Tbps fabric. If dual peers are logically one device, how to you interconnect them to also provide for 4 Tbps? With SVL, the peers do not want to pass data between themselves, so if you design for that, it's a non-issue. With StackWise, you try to provide the necessary bandwidth because the peers do not try to avoid using each other.

Again, with just two peers, for StackWise, other than an extra hop, if you unnecessarily use a peer, there may be enough ring bandwidth, that bandwidth is not oversubscribed (as it may be with a larger number of stack members).

However, traditionally, StackWise switches do not have the hardware resources of the SVL level of switches, i.e. they may have performance issues unrelated to the stack, itself.

Again, going back years ago, Cisco had the 3750G with 48 copper gig ports (plus uplinks) and the 4948 with 48 copper gig ports. Hmm, both has 48 copper ports, so why what the 4948 so much more expensive?

Well the 3750G fabric was on 32 Gbps (enough for 16 gig ports) while the 4948 fabric was 96 Gbps. So as many found out (peruse old forum posts), 3750Gs didn't do well in distro or core roles, or even server edge roles, if they were "busy".

Cisco then came out with the 3750E, which also had a non-blocking fabric, and PPS to support wire-rate for its 48g copper ports; much like the 4948. So, it was not just as good as the 4948? Better, as it supported stacking, right? Well, no, because it didn't have the buffer capacity of the 4948. so burst oversubscription of edge ports would often lead to drops, not seen on a 4948. Same problem on the 3750-X and the 3650 and 3850 too. Again, if desired, peruse forum posts. BTW, the 4948 wasn't stackable, but eventually, other members of the Catalyst 4K series, did obtain VSS.

To recap, both StackWise and SVL offer about the same redundancy and L2 features, but SVL somewhat mimics how L3 routing would operate for path selection. Basically, it considers using its peer as an extra high cost hop, so it generally avoids it. If you have lots of east-west peer traffic, VSS/SVL is worse than StackWise, yet even StackWise can run into peer<>peer(<>peer. . .) bandwidth issues.

If you have very high bandwidth requirements, between devices at the same level, your best performance option is using a single device that supports the bandwidth requirement and port density (reasons for existence of large physical devices - consider something like Cisco's CRS devices), or using a logical multi device fabric, Cisco's APIC/ACI, otherwise, you use a hierarchal topology, but care needs to be done supporting redundancy.

BTW, in another vendor's product portfolio, which I've used, they have stackable switches than can use high bandwidth local stacking, or "ordinary" links for non-local stacking, or both, concurrently. I believe, they treat both, internally, much like a multi-switch fabric, supporting one logical switch. Logically superior, I believe, to either StackWise or VSS/VSL, but a later technology development than Cisco's. Further, especially with VSS/VSL, it was possibly designed for devices never designed for this. So, not surprising it might be logically superior, however, also possibly, its devices didn't seem as operationally solid as Cisco's. (Many of us mention issues with Cisco devices, but compared to many Brand X devices, Cisco devices often work as documented, and when they don't, Cisco actually provides updates so that they do.)

View solution in original post

Flavio Miranda · ‎11-13-2024

@Hc-angie-chen

Stackwise (Without the virtual) is the traditional Stack we all know. It´s there for a long long time.

Stackwise virtual is something new in comparison with traditional stackwise. At the end of the day, they have the same function that is connect switches together creating one switch with double interface count.

I dont believe you miss any feature by using one or other but they will have different purpose on the network. Stackwise you use for access switch in order to increate interface capacity. StackWise, in the past also called VSS, you will use to connect Core switches or distribution switches with redundancy objective, not for high density interfaces system.

MHM Cisco World · ‎11-13-2024

for access SW we need as much as we can port to connect to endpoints, the max port per SW is 48 so to get more we stackwise multi SW to get more port

for agg and core layer SW we need high availability and redundancy of L3/L2 service and hence we need multi Supervisor and use stackwise virtual to connect these Sup (SW)

MHM

Hc-angie-chen · ‎11-13-2024

Hi MHM,

Thanks a lot for your reply.

Since StackWise already implements an Active/Standby switch mechanism to achieve high availability with link, device, and power redundancy in failure scenarios, could you explain how Multi-Supervisor in StackWise Virtual provides additional benefits or contributes to an even “higher” level of availability or redundancy beyond what StackWise already offers? Or is there any document that I can refer to?

Again, thank you for your insights!

Best regards,

Angie

Joseph W. Doherty · ‎11-14-2024

Unsure it still applies, but I recall (?), in the past, VSS took less of a hit to transit traffic than StackWise.

Also, I recall (?), VSS eventually supported dual sups in their chassis, so each VSS member was less likely to fail.

Assuming my recollections are accurate, these features would usually be more important in a core or distribution role than an user access edge. (For servers, well there's Nexus, which has its own redundancy architecture. Interestingly, a Nexus pair still operate as individual devices. [There's also ACI, but I don't what to digress further.])

MHM Cisco World · ‎11-15-2024

Joseph W. Doherty · ‎11-15-2024