Re: Distributed Switch/vNIC Oddity

JOHN GOUCHER · ‎02-22-2011

Very strange issue. I have 5 ESXi servers in UCS. There are also a number of VLANs used for VM servers connected to virtual distribution switches as we use the Palo CNA that are then passed to vSphere server. Everything has been working fine until recently I created a private switch with a VLAN tag that only exists in the UCS system. It’s not on our core as I don’t need or want the traffic past the UCS system. This basically used as the private network for a Microsoft Failover Cluster configuration.

What is odd is that there are 5 Windows 2008 R2 server with a normal routable VLAN NIC and then this non-routable private VLAN NIC. There are no network communication issues on the routable interfaces between servers, but 2 out of the 5 server cannot be ping'ed or ping any other server on the private interface besides themselves.

Server1 can ping S2, S5

Cannot ping S3, S4

Server2 can ping S1, S5

Cannot ping S3, S4

Server3 can ping S4

Cannot ping S1, S2, S5

Server4 can ping S3

Cannot ping S1, S2, S5

Server5 can ping S1, S2

Cannot ping S3, S4

I have removed and re-added the NIC in the VM server itself for both S2 and S3 with no change in behavior. A different VM server had another NIC added that was in this private VLAN and it behaved like S3 and S4.

From what I understand of Ethernet this should not be happening. Anyone have any clues to where I should look for a solution? Note I am running UCS 1.3.1n and VMware ESXi 4.1.

TIA

mipetrin · ‎02-22-2011

Hi John,

From your description, what you're seeing could be considered expected behaviour, based on a few things.

It seems as though servers 1,2 and 5 are pinned to one Fabric interconnect, while server 3 and 4 are pinned to the other Fabric interconnect.

Since you have only defined this particular VLAN X within the UCS cluster, and NOT on any of the northbound switches, this is what's causing your issue. By default, the Fabric interconnects run in End Host Mode and any side-to-side communication (between blades) MUST traverse the upstream core network. This is because no data traffic is passed over the Fabric Interconnect cluster links. So what seems to be happening in your scenario, is when server 1 (eg: pinned to Fabric A) attempts to communicate with server 3 (eg: pinned to Fabric B) via VLAN X, the packet leaves server 1, towards Fabric A. Once it hits Fabric A, it has no path to be able to send the traffic towards server 3 - as VLAN X is not defined northbound and it cannot traverse the cluster links. On the otherhand, when server 1 attempts communication with Server 2 (eg: also pinned to Fabric A) via VLAN X, once the packet hits Fabric A, it knows how to get back to Server 2 and correctly forwards the traffic. Additionally, all servers can communicate with each other via all other VLANs because they are defined northbound.

Below is a link to some videos on Cisco UCS Networking which should hopefully clear up some concepts:
http://bradhedlund.com/2010/06/22/cisco-ucs-networking-best-practices/

A couple of options that you could do:
1) Extend the VLAN to the upstream switches, and only allow it on the trunks towards the UCS cluster and between each switch (if using multiple upstream switches)
2) Configure the Fabric interconnects to run in Switch mode

Hope that helps clarify what you are experiencing.

Regards,
Michael

JOHN GOUCHER · ‎02-23-2011

Michael,

This makes complete sense now that you mention it. Did not even think about the fabric seperation in this design. I appreciate you feedback.

Thanks,

John