cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1674
Views
5
Helpful
9
Replies

IP hash load balancing UCS-C Series with 6454 Fabric

cmturner20
Level 1
Level 1

Have a question regarding the load balancing methods in ESXi 6.7. I can't find a definitive answer believe it or not.

 

Have UCS-C Series (220,240 M5's) VIC's -> 6454 Fabric -> Nexus 9k.

 

Each UCS-C Series host has 1x25gig nic into Fabric-A and Fabric-B

Each Fabric runs an LACP port-channel (2x40gig) to each Nexus 9k carrying the vlans needed.

VPC's configured on the N9K's corresponding to each LACP port channel (2x40gig) from each Fabric.

Each host runs VDS with one uplink into each Fabric.

Each host is presented multiple vmnics.

 

All works fine but what I would like a clear answer on is.. Can I use IP Hash for the load balancing? Obviously balancing by port id is fine but I need to distribute traffic more evenly and am currently using load balance by physical nic load which is working great but then apparently this is not supported either. Can anyone definitely confirm these questions?

 

Thanks

 

 

 

 

9 Replies 9

Kirk J
Cisco Employee
Cisco Employee

You cannot use LACP/IP hash for your ESXi dVS teaming settings because the FIs do not share a data or control plane, which is required for switch dependent bonding configs.

See https://www.cisco.com/c/en/us/support/docs/servers-unified-computing/ucs-b-series-blade-servers/200519-UCS-B-series-Teaming-Bonding-Options-wi.html

 

Thanks,

Kirk..

@Kirk J Ok thank you. So LAG on the dvs and IP Hash are out.

 

So when it says "not supported" for physical nic load balancing. Is this from a support perspective or should it just not work? 

It may "work", but your network will likely have a bad time.

Upstream of the UCS Fabric Interconnects pair are two distinct port-channels.

As the traffic (read MAC address) changes between FI-A and FI-B there WILL be an upstream "MAC flap".

If this occurs every once in a while, no problem; but if this occurs frequently, the network may not be happy.

Some upstream technologies (Nexus) will stop learning that specific MAC address for some time and effectively "pin" the MAC to one of the port-channels.

Some upstream technologies (ACI) will disable a port-channel after numerous MAC flap occurrences.

 

What problem are you trying to solve?


@Steven Tardy wrote:

It may "work", but your network will likely have a bad time.

Upstream of the UCS Fabric Interconnects pair are two distinct port-channels.

As the traffic (read MAC address) changes between FI-A and FI-B there WILL be an upstream "MAC flap".

If this occurs every once in a while, no problem; but if this occurs frequently, the network may not be happy.

Some upstream technologies (Nexus) will stop learning that specific MAC address for some time and effectively "pin" the MAC to one of the port-channels.

Some upstream technologies (ACI) will disable a port-channel after numerous MAC flap occurrences.

 

What problem are you trying to solve?


Funnily enough, the network responds better overall with physical nic loading as the teaming method which I'm trying to work out why. I'm going to try mac hash but either way, trying to organise traffic more evenly over the links as we are exhausting buffers on the fabric and nexus. I'm testing more qos over the interfaces (dpp,afd) etc etc but I might just have to add more links in the end but we are nowhere near line rate which is extremely frustrating. Originating port ip teaming does not evenly load the traffic, IP hash should but we can't use it and LAG is out. 

 

 

Check to see if there are discards when vNIC hands off to the OS using ESXi command:

/usr/lib/vmware/vm-support/bin/nicinfo.sh  | egrep "^NIC:|rx_no_buf"

If these counters are incrementing while testing then you can increase the vNIC buffers which the OS/driver allocates.

See: https://www.cisco.com/c/dam/en/us/products/collateral/interfaces-modules/unified-computing-system-adapters/vic-tuning-wp.pdf

Yes I have been through that article a while a go and am currently maxing the que's/buffers/rss settings in UCSM :)

rx_no_bufs: 176200
NIC: vmnic11
rx_no_bufs: 0
NIC: vmnic2
rx_no_bufs: 0
NIC: vmnic3
rx_no_bufs: 0
NIC: vmnic4
rx_no_bufs: 0
NIC: vmnic5
rx_no_bufs: 0
NIC: vmnic6
rx_no_bufs: 0
NIC: vmnic7
rx_no_bufs: 0
NIC: vmnic8
rx_no_bufs: 4514092
NIC: vmnic9
rx_no_bufs: 6282912

They aren't incrementing every refresh but I'll check it today. Even with the rss/que's maxed we still get drops unfortunately.

Without switch assisted bonding, you can only have TX load balancing.

Trying to setup switch assisted bonding, will likely create a mismatch in hashing (what macs or IPs) that the host or FI is sending particular traffic stream through, and creates complex troubleshooting when some sessions hashes match (and traffic is passed correctly) vs some that don't match, and traffic is black-holed.

 

Kirk....

Are you trying to solve "slow network" issues?

If so then, there are several things I'd check:

  OS power settings, set high performance, disable c-states.

  Change UCS vNIC queues following the link I posted before.

  If this is a TCP application then the issue could very well be application/OS issues.

 

Get packet captures to make sure if this is a TCP/Application issue instead of a hardware/infrastructure issue.

I'm just trying to solve these drops for no reason really. I believe the network will be better off without them being that we are quite sensitive to latency with vdi. I'm going to try more qos and see if the afd profile ( mesh vs ultra burst ) helps.
OS is set to high perf with the profiles and in esxi, c states are disabled
vnics are maxed in esxi and the profile
Thanks for the help so far.
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: