We recently began testing VMware ESXi 5.0 on our production network. After observing some heavy discards (3-10 million at times) on the 10G uplinks FROM our core 6509s TO the Nexus 5Ks we began some investigation. We started by capturing traffic on vPCs from the Nexus 5K to the 6509s. We found a tremendous amount of unicast vMotion traffic transmitting from the 6509s to the Nexus 5Ks. Unicast vMotion traffic should never touch the 6509s core switches since it is layer two traffic. We found that our problem was two fold. Problem number one was the fact that on the ESXi 5 test cluster we had vMotion and the management vm kernel nics in the same subnet. This is a known issue in which ESXi replies back using the management virtual mac address instead of the vMotion virtual mac address. Therefore the switch never learns the vMotion virtual mac address thus flooding all of the vMotion traffic. We fixed problem number 1 by creating a new subnet for the vMotion vm kernel nics and we also created a new isolated vlan across the Nexus 5Ks that does not extend to the cores, modifying the vDistributed switch port group as necessary. To verify that the vMotion traffic was no longer flooding we captured traffic locally on the N5K, not using SPAN but simply eves dropping on the vMotion VLAN as an access port. The testing procedure involved watching the CAM table on the 5K, waiting for the vMotion mac addresses to age out then starting a vMotion from one host to another. Doing this process we were able to consistently capture flooded vMotion traffic onto our spectator host doing the captures. The difference from problem 1 was that the flooding did not include all of the vMotion conversation as before but when vMotioning 1-2 servers we saw anywhere from 10ms to 1 full second of flooding then it would stop. The amount of flooding varied but greatly depended on whether the traffic traversed the vPC between the 5Ks or not. We were able to make the flooding much worse by forcing the traffic across the vPC between the N5Ks.
Has anyone else observed this behavior with N5Ks or VMware on another switching platform?
We were able to eliminate the vMotion flooding by pinging both vMotion hosts before beginning the vMotion. It seems that if VMware would setup a ping to verify connectivity between the vMotion hosts before starting the vMotion it would eliminate the flooding.
A brief description of the network..
Two 6509 core switches with layer 2 down to two Nexus 5020 running NX-OS version 5.0(3)N2(2b) using 2232PP FEX for top-of-rack. For testing purposes each ESXi host is dual-homed with one 10G link (CNA) to each N5K through the FEX. VMware is using vDistributed switch with a test port-group defined for the ESXi 5 boxes.
For curiosities sake we also observed packet captures from ESX 4.1 where we saw similar unicast flooding although it was near not as many packets as in ESXi 5.
We have a case open with TAC and VMware to track down the issue but were curious if anyone else has observed similar behavior or had any thoughts.
Essentially the fix was to (a) turn off mac aging on the vmotion vlan on the 5K, (b) remove the L3 addressing from the vmotion vlan by not extending it to the 6K, and for good measure we (c) dedicated 2x10G ports per server just for multi-nic vmotion. These three measures did the trick.