No, I have not been able to

EMSITManager · ‎04-20-2016

We are currently having a problem with our high availability environment whereby the cluster loses quorum and all the VMs reboot when the CISCO master is power cycled. The relevant components in the environment are shown in the attached PDF file.

Server A and Server B are both Microsoft Server 2012 R2. Also, not shown is the servers are connected to a redundant SANS using MPIO.

The CISCO SG500Xs have the latest firmware version 1.4.2.4

The LAG/TEAMs have the following parameters set:

CISCO SG500X LAG parameters from GUI:

Load Balance Algorithm: IP/MAC Address
Port Priority = 1
LACP Timeout = Long
Administrative Auto Negotiation: Enable
Administrative Flow Control: Disable
LACP: Enable

Windows Server 2012 R2 Hyper-V Cluster Team properties:

Teaming mode: LACP
Load balancing mode: Dynamic
Standby adapter: None

The problem is when the CISCO master is power cycled the cluster shuts down and all the VMs reboot. This is a major problem. The event log on the servers show the appropriate NIC that is connected to the master as going down which it should. However, it also shows that the TEAM is no longer operational. This causes the cluster to lose connectivity to the other nodes in the cluster and therefore the cluster shuts down and all the VMs reboot.

My understanding of LAG/TEAMS is that as long as 1 member of the LAG/TEAM is operational it should keep working. That’s the point of high availability.

So I could use any help or comments as to what I might have configured incorrectly.

jramon002 · ‎01-26-2017

Did you ever figure this out?

EMSITManager · ‎01-26-2017

No, I have not been able to figure this out.

Today I had the power on the Master switch go out due to a UPS failure. When I plugged the master back in the entire stack did a cold reboot and as you would expect the entire cluster began to fail over.

I call Cisco small business support. I have all the equipment covered under a Small Business PRO Service contract and Cisco basically stated they could not give me any support because there was a power failure. Never heard something so ridiculous.

Sorry I ever recommended this Cisco equipment to my client never mind having spent extra for the service contract.

jramon002 · ‎01-26-2017

I have a similar symptom. Upon rebooting my HyperV hosts, the LACP drivers seem to lock up the Cluster & Live migration team, causing the Host to lock up.

AndraZ · ‎11-14-2018

Another sorry customer of these switches. This is not a stacking switch if it can't handle master power-cycle. Will never recommend this again.

To make matters worse, when this switch is in the same network as Aruba wifi device, it resets itself because of some kind of firmware bug. And there goes the whole HyperV cluster again...

I can't believe how bad these switches were for me.

SG500x stack with teaming/LAG to Hyper-V cluster fails high availability