VM Causes Output Errors

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2014
10:19 AM
- last edited on
03-25-2019
01:39 PM
by
ciscomoderator
I'm looking for some out of the box thinking here.
Here's the setup:
vSphere 5.1 Cluster
4 x B200M2 in Chassis A
4 x B200M2 in Chassis B
Chassis A has 2104XP IOM and does not use port-channeled connectivity.
Chassis B has 2208XP IOM and DOES use port-channels.
Windows 2008R2 virtual machine, hardware version 8, VMXNET3 network interface.
Here's the problem:
If the VM is running on an ESXi host in Chassis A, the physical interface that server is pinned to will start clocking output errors slowly. If you move it to another host in the same chassis, the physical link for that host will start clocking errors. If you move it to a host in the other chassis, the port-channel starts clocking output errors.
- Labels:
-
Unified Computing System (UCS)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2014 10:31 AM
Which UCS version. Which adaptor in the B200-M2 ?
Did you check that enic/fnic and W2008 R2 drivers comply with the UCS version according to the UCS Interop Matrix ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2014 11:00 AM
Fabric Interconnect @ 2.1(3b), though problem existed when FI version matched blades, we're in an interim upgrade phase.
Blade CIMC/BIOS/Firmware @ 2.1(1d), eNIC 1.5.0.20, fNIC 2.1.2.38
after host updates to 5.1U2 and firmwares 2.1(3b), eNIC will be updated to 1.5.0.45, fNIC will not.
vSphere 5.1GA/Patch3 on 2.1(1d), eNIC 1.5.0.20, fNIC 2.1.2.38 matches the support matrix.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2014 11:05 AM
Thanks ? which I/O adaptor ?
Can you please post the error message / counter that you see on the interface ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2014 11:10 AM
The blades have M81KRs, I should probably see if a VIC1240 does it as well but I don't have that class blade spare right now.
Here's what we see from NXOS:
Ethernet1/1 is up
Hardware: 1000/10000 Ethernet, address: 0005.73fb.5a88 (bia 0005.73fb.5a88)
Description: S: Server
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA
Port mode is fex-fabric
full-duplex, 10 Gb/s, media type is 10G
Beacon is turned off
Input flow-control is off, output flow-control is off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
Last link flapped 8week(s) 0day(s)
Last clearing of "show interface" counters never
30 seconds input rate 7072336 bits/sec, 884042 bytes/sec, 1683 packets/sec
30 seconds output rate 40788736 bits/sec, 5098592 bytes/sec, 4534 packets/sec
Load-Interval #2: 5 minute (300 seconds)
input rate 7.07 Mbps, 1.32 Kpps; output rate 10.95 Mbps, 1.93 Kpps
RX
13017205996 unicast packets 7010323 multicast packets 5459427 broadcast packets
13029675746 input packets 11016470049276 bytes
4321622131 jumbo packets 0 storm suppression packets
0 giants 0 input error 0 short frame 0 overrun 0 underrun 0 watchdog 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
15992467619 unicast packets 257608563 multicast packets 455953425 broadcast packets
16706140077 output packets 14865775850059 bytes
6553509485 jumbo packets
110470 output errors 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble
0 Tx pause
2 interface resets

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2014 11:24 AM
I see jumbo frames and multicasts !
Which applications use JF resp. MC ?
Is JF properly configured on the UCS and/or vswitch / DVS / N1K
eg. did you check ping from ESX CLI with jumbo frames and -DF flag ?
are you using vswitch, DVS or N1k
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2014 11:27 AM
We're using VMware VDS, set to 5.1 version.
We're not using jumbo frames intentionally anywhere, the QoS system class and vNICs are at the UCS defaults. Some of our VLANs have IGMP querier turned on and application inside them that do some basic clustering over multicast. The jumbo frames recorded are probably from the first hop FCoE traffic I would think? We do FC connectivity straight to MDS not through Nexus.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2014 11:41 AM
ok, thanks for the clarifications.
If I understand you correctly, the output errors show up between IOM and FI ? Correct ?
Are this errors seen on both fabrics A and B ?
What kind of load balancing is setup on the DVS ?
Do you see errors on the Northbound uplink from FI as well ?
After all; are this output errors cosmetic, or are you having performance problems ? and if yes ? IP and/or FC ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2014 12:04 PM
Yes, errors show up between the FI and IOM.
Errors can be seen on either A side or B side.
VDS is set to route based on source port by default. I have also created duplicate port-groups as a test, that have difference active uplinks assigned. Example:
PG_ONE (Tag 1130) - both nics set as "active", the "normal" port group.
PG_ONE_A (Tag 1130) - vmnic0 Active, vmnic 1 Passive
PG_ONE_B (Tag 1130) - vmnic0 Passive, vmnic 1 Active
If I change the VM's port-group assignment from PG_ONE_A to PG_ONE_B, you will see the errors move from the A side to the B side of UCS. The errors never show up on the LAN or SAN uplinks.
This is all cosmetic, but unfortunately the transmit errors are being picked up by Solarwinds SNMP monitoring of the fabric interconnect and therefore are tripping our thresholds for errors. Ideally this threshold catches "receive" error which is our key indicator for a bad cable between the FI and IOM.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2014 12:15 PM
Do you see any errors on vCenter and/or ESXi ?
esxcli network nic stats get -n vmnic0
-----------------------------------------------
Let me summarize: problem shows up
- on fabric A and/or B
- 2104 and 2208
- multiple blades in 2 chassis
- seen only between IOM and FI
- not seen Northbound of FI
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2014 12:34 PM
Summary is accurate, I grabbed some stats off one of the hosts.
For reference:
vmnic0 and 1 are for Management only
vmnic2 and 3 are for VDS
vmnic5 and 6 are vMotion
~ # esxcli network nic stats get -n vmnic0
NIC statistics for vmnic0
Packets received: 183853406
Packets sent: 14248
Bytes received: 15109232248
Bytes sent: 10860364
Receive packets dropped: 0
Transmit packets dropped: 0
Total receive errors: 0
Receive length errors: 0
Receive over errors: 0
Receive CRC errors: 0
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
~ # esxcli network nic stats get -n vmnic1
NIC statistics for vmnic1
Packets received: 216684173
Packets sent: 78566218
Bytes received: 47589164213
Bytes sent: 83540882074
Receive packets dropped: 27
Transmit packets dropped: 0
Total receive errors: 17
Receive length errors: 0
Receive over errors: 0
Receive CRC errors: 17
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
~ # esxcli network nic stats get -n vmnic2
NIC statistics for vmnic2
Packets received: 2308445804
Packets sent: 1769198467
Bytes received: 760268005329
Bytes sent: 1007781294817
Receive packets dropped: 5801
Transmit packets dropped: 0
Total receive errors: 4
Receive length errors: 0
Receive over errors: 0
Receive CRC errors: 4
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
~ # esxcli network nic stats get -n vmnic3
NIC statistics for vmnic3
Packets received: 3515742297
Packets sent: 1260747637
Bytes received: 795264860939
Bytes sent: 382208295578
Receive packets dropped: 0
Transmit packets dropped: 0
Total receive errors: 179
Receive length errors: 0
Receive over errors: 179
Receive CRC errors: 0
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
~ # esxcli network nic stats get -n vmnic4
NIC statistics for vmnic4
Packets received: 4822791
Packets sent: 13
Bytes received: 660278761
Bytes sent: 832
Receive packets dropped: 0
Transmit packets dropped: 0
Total receive errors: 0
Receive length errors: 0
Receive over errors: 0
Receive CRC errors: 0
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
~ # esxcli network nic stats get -n vmnic5
NIC statistics for vmnic5
Packets received: 129866853
Packets sent: 43742076
Bytes received: 186885321654
Bytes sent: 51537911982
Receive packets dropped: 53669
Transmit packets dropped: 0
Total receive errors: 8
Receive length errors: 0
Receive over errors: 0
Receive CRC errors: 8
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2014 12:42 PM
Kinda fascinating, I forgot for a second I had just vMotioned the VM to another host, going back the one its been running on for some time, we see lots more receive over errors.
~ # esxcli network nic stats get -n vmnic2
NIC statistics for vmnic2
Packets received: 6443929963
Packets sent: 5603998409
Bytes received: 1958660312013
Bytes sent: 2551197563270
Receive packets dropped: 3560
Transmit packets dropped: 0
Total receive errors: 178192
Receive length errors: 0
Receive over errors: 178260
Receive CRC errors: 0
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2014 01:13 PM
I'm actually seeing the behavior reproduced for two domain controllers for this same Active Directory which reside at another datacenter, with similar blades and configurations (including mixed FIC/blade firmwares). Since that vSphere cluster is two nodes and is only running these two VMs, I'm updating one of the two hosts to 5.1U2 and UCS 2.1(3b) with newer fNIC/eNIC drivers and seeing where that gets me.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2014 01:14 AM
This might be related to DVS and or ESXi, see eg.
The output of esxtop shows dropped receive packets at the virtual switch (1010071)
kb.vmware.com/kb/1010071
vCenter Server 5.1 and 5.5 performance charts report dropped network packets (2052917)
http://kb.vmware.com/kb/2052917
http://kb.vmware.com/kb/205291

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-24-2014 07:58 AM
Well... firmware update to 2.1(3b) and ESXi5.1U2+patches did not resolve the issue. :|
