06-17-2014 03:30 AM - edited 03-01-2019 11:42 AM
Since the update to 2.2(1d) we have network timeouts / outtages between vsphere vm's hosted on ucs b-series servers and the rest of the world outside vmware / ucs. first occured with sles10 guests, but windows 2008 and more are also involved. any idas or similar experiences?
07-08-2014 12:19 AM
I've informed my Cisco Support guy about this thread.
Let's see what happens. I would be pleased to hear from you if there are any news on this topic.
07-08-2014 06:41 AM
I'm curious. Are you running this on b-series with Mk81 VIC's and ESX 5.5? We're on B230-M2's.
Cisco has asked that I pull it out and replace it with a VIC 1280. I thought we had some, but don't. What I did do is move a couple of the problem VM's to a B200M3 with the 1240, and the problem went away. Moved it back to the B230's and the drops resurfaced immediately.
We'll be picking this up again at noon. There is a newer version of the enic posted at vmwares site, at 2.1.2.50. I might toss that on a blade and see.
07-08-2014 11:38 PM
I still have enic .42 version running on my esxi's.
I have here these blades, and i'll try all of them to confirm what you have seen:
B200M2 - M81KR
B230M2 - M81KR
B200M3 - VIC1240
Actually the troubling VMs are running on B200M3 with VIC1240 and we have the trouble there, so I'll move them around a bit and report what I've seen then.
My Supporter thinks the 6500 Coreswitches/routers have the error, not UCS. maybe the ASA FW Module or VSS/Hypervisor out of sync errors.
07-16-2014 11:46 AM
We eliminated the specific network card a few days ago. However we did obtain stability and are in a work around state.
Each FI is connected to both our 7K's (no VPC). We disabled one of those two links on each FI. Rock solid now.
So far Cisco hasn't been able to duplicate our issue. We re just sitting and waiting now.
07-09-2014 01:18 AM
okay, tested all 3 bladetypes with the two adapters. it's on all blades the same. after about 50 seconds pinglosses occur.
Will have another phonecall with cisco today. VMware stated it's not their fault and closed the request for now.
08-05-2014 12:20 AM
Today I've had a long phonecall and webex with Cisco.
What we've found out is the following:
The affected VMs (this who's mac addresses occur on the wrong uplink interface on the upstream switch)
are using Multicast Packages. I've configured following Portpinning in UCS for FI A and FI B:
1/25 - Uplink ESXi Traffic (Vmotion, HA, Mgmt-Console, etc.)
1/26 - Uplink for alle the VM Guest Machines
All Uplink use the same VLANs of the provided trunk.
MCAST Policy in UCS: IGMP Snooping State enabled, Querierstate disabled (IMHO the defaults)
What we see with "show ip igmp snooping VLAN <id>" is, that UCS always seems to use the first
Uplink that is able to handle IGMP as Querier Port, so here 1/25 but the VMs that are also doing MCAST are pinned
to Uplink 1/26 so if there's not much traffic and the VMs are doing MCAST the MAC is learned on the wrong Port at the
Upstream Switch. I manually disabled and re-enabled the Uplink 1/25 to force the querier to be detected on Port 1/26
and the error was gone.
Could anyone try to check if it's the same on your site?
Workarounds actually are:
1) disable IGMP in UCS and VM guests
2) disable / only use 1 Uplink per FI
3) disable all VM guest VLANs on the Uplink for ESXi Traffic and vice versa, in other words:
Only enable VLANs per Uplink which are absolutely needed
4) Swap the mac-pinning configs between the Uplinkports
5) Bind the Uplinkports as Portchannel and then as pinned Interface
Cisco says it's not a bug but a expected behavior. I cannont follow this, because if I a) do Portpinning and b) want / need to use IGMP
then IMHO ithe MCAST traffic should flow through the Uplinks configured via Portpinning and not only through one (the first) of them. Could anybody with older Firmware (2.1.xx) try to send me the IGMP Snooping Querier results seen with above mentioned command?
So for now I do not know what of the three options would be the best one for me :-(
Tending to do Workaround #4. What now?
08-11-2014 04:36 PM
Hello,
I don't want this to be taken as a recommendation, but more as an informative post in response to your concerns about the the Cisco tech may have said in regards to it being expected behavior. But just for reference, this would be in line with your option #3 you have listed above.
Layer 2 Disjoint Upstream Packet Forwarding in End-Host Mode
Server links (vNICs on the blades) are associated with a single uplink port (which may also be a PortChannel). This process is called pinning, and the selected external interface is called a pinned uplink port. The process of pinning can be statically configured (when the vNIC is defined), or dynamically configured by the system.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide