08-04-2020 06:45 PM
Have a question regarding the load balancing methods in ESXi 6.7. I can't find a definitive answer believe it or not.
Have UCS-C Series (220,240 M5's) VIC's -> 6454 Fabric -> Nexus 9k.
Each UCS-C Series host has 1x25gig nic into Fabric-A and Fabric-B
Each Fabric runs an LACP port-channel (2x40gig) to each Nexus 9k carrying the vlans needed.
VPC's configured on the N9K's corresponding to each LACP port channel (2x40gig) from each Fabric.
Each host runs VDS with one uplink into each Fabric.
Each host is presented multiple vmnics.
All works fine but what I would like a clear answer on is.. Can I use IP Hash for the load balancing? Obviously balancing by port id is fine but I need to distribute traffic more evenly and am currently using load balance by physical nic load which is working great but then apparently this is not supported either. Can anyone definitely confirm these questions?
Thanks
08-05-2020 03:46 AM - edited 08-05-2020 06:38 AM
You cannot use LACP/IP hash for your ESXi dVS teaming settings because the FIs do not share a data or control plane, which is required for switch dependent bonding configs.
Thanks,
Kirk..
08-05-2020 04:24 PM
@Kirk J Ok thank you. So LAG on the dvs and IP Hash are out.
So when it says "not supported" for physical nic load balancing. Is this from a support perspective or should it just not work?
08-05-2020 05:07 PM
It may "work", but your network will likely have a bad time.
Upstream of the UCS Fabric Interconnects pair are two distinct port-channels.
As the traffic (read MAC address) changes between FI-A and FI-B there WILL be an upstream "MAC flap".
If this occurs every once in a while, no problem; but if this occurs frequently, the network may not be happy.
Some upstream technologies (Nexus) will stop learning that specific MAC address for some time and effectively "pin" the MAC to one of the port-channels.
Some upstream technologies (ACI) will disable a port-channel after numerous MAC flap occurrences.
What problem are you trying to solve?
08-05-2020 07:00 PM
@Steven Tardy wrote:It may "work", but your network will likely have a bad time.
Upstream of the UCS Fabric Interconnects pair are two distinct port-channels.
As the traffic (read MAC address) changes between FI-A and FI-B there WILL be an upstream "MAC flap".
If this occurs every once in a while, no problem; but if this occurs frequently, the network may not be happy.
Some upstream technologies (Nexus) will stop learning that specific MAC address for some time and effectively "pin" the MAC to one of the port-channels.
Some upstream technologies (ACI) will disable a port-channel after numerous MAC flap occurrences.
What problem are you trying to solve?
Funnily enough, the network responds better overall with physical nic loading as the teaming method which I'm trying to work out why. I'm going to try mac hash but either way, trying to organise traffic more evenly over the links as we are exhausting buffers on the fabric and nexus. I'm testing more qos over the interfaces (dpp,afd) etc etc but I might just have to add more links in the end but we are nowhere near line rate which is extremely frustrating. Originating port ip teaming does not evenly load the traffic, IP hash should but we can't use it and LAG is out.
08-06-2020 11:04 AM
Check to see if there are discards when vNIC hands off to the OS using ESXi command:
/usr/lib/vmware/vm-support/bin/nicinfo.sh | egrep "^NIC:|rx_no_buf"
If these counters are incrementing while testing then you can increase the vNIC buffers which the OS/driver allocates.
08-09-2020 04:02 PM
08-06-2020 12:58 PM - edited 08-06-2020 06:46 PM
Without switch assisted bonding, you can only have TX load balancing.
Trying to setup switch assisted bonding, will likely create a mismatch in hashing (what macs or IPs) that the host or FI is sending particular traffic stream through, and creates complex troubleshooting when some sessions hashes match (and traffic is passed correctly) vs some that don't match, and traffic is black-holed.
Kirk....
08-07-2020 11:02 AM
Are you trying to solve "slow network" issues?
If so then, there are several things I'd check:
OS power settings, set high performance, disable c-states.
Change UCS vNIC queues following the link I posted before.
If this is a TCP application then the issue could very well be application/OS issues.
Get packet captures to make sure if this is a TCP/Application issue instead of a hardware/infrastructure issue.
08-09-2020 04:09 PM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide