Solved: Etherchannel hash / bit-buckets and load balancing across member

cbeswick · ‎06-25-2013

Hi,

I have a question regarding how Etherchannel load balancing is performed across the bundle and the issues one might face when dealing with 2,4,8 or odd number bundles which do not comply with the power of 2 rule.

I have read a few articles explaining how the the different platforms, Sup32, Sup720, Sup2T and Nexus M1/F1 modules use a different number of bit buckets to spread traffic across the bundled links. For example, the Sup32, Sup720 and Nexus M1 modules all appear to use a 3-Bit hash which explains why you should only ever bundle etherchannels in powers of 2 to get an even spread of traffic. Furthermore, the Sup-2T and Nexus F1 modules both have an 8-Bit hash available, providing a more granular spread of traffic in any combination of bundle including those that do not follow the power of 2 rule.

What I am struggling to understand is where the configurable load-balancing hash in the configuration then comes into play, i.e. src-dst mac, sr-dst ip etc etc.

Do the bit buckets merely provide the underlying mechanism to provide the theoretical spread of traffic all things being equal, and the configurable hashing algorythm merely provides a "tweak" to how the packets are balanced ? I usually get myself confused at this point trying to figure it out, so need someone with a better brain accustomed to mathematics than I

Thanks in advance.

Peter Paluch · ‎06-25-2013

Hello,

What I am struggling to understand is where the configurable  load-balancing hash in the configuration then comes into play, i.e.  src-dst mac, sr-dst ip etc etc.

I am not sure if I am answering exactly what you are asking but let me try.

The hashing algorithm itself is a function that takes an input - be it a MAC address, IP address, sometimes L4 port information - and produces an output that is of constant length - either 3 or 8 bits. These are the "bit buckets" - the number of bits the input value is processed into.

As an example (though it is most probably not implemented this way), recall the MOD operation (modulo - remainder after an integer division). A simple hash function can be given as:

address MOD 8

Notice that the result of this function can be 0, 1, 2, ..., 7, and not more. So any input address will be processed into a value that is in the range of 0-7 and can hence be expressed using 3 bits, or 3 bit buckets. If you selected a pair of addresses to be load-balanced across, say, src-dst-ip, then usually a XOR operation between the source and destination address is first performed to produce a single value that "somehow" depends on both source and destination IP, and that address is then fed into the hash function, e.g.:

(sourceIP XOR destinationIP) MOD 8

If you performed an operation MOD 256 instead of MOD 8, you would get 8 bit results instead of 3 bit results.

All these are just examples - real hashing functions are more sophisticated to provide for good uniformity of the result.

The term "bit bucket" is probably referring to the fact that the hashing function is implemented in hardware for speed and efficiency reasons, meaning that the bits from the input address or addresses are fed into hardware-implemented arrays that perform bitwise operations on them and store the result into a selected number of bits - those bit buckets. However, even though I named, for example, the MOD operation here as a formal mathematical function, these switches - if they used the MOD operation - would have it implemented in hardware and they would be storing the result in a certain bit width, hence the number of bit buckets.

Please feel welcome to ask further!

Best regards,

Peter

View solution in original post

Peter Paluch · ‎06-25-2013

Hello,

What I am struggling to understand is where the configurable  load-balancing hash in the configuration then comes into play, i.e.  src-dst mac, sr-dst ip etc etc.

I am not sure if I am answering exactly what you are asking but let me try.

The hashing algorithm itself is a function that takes an input - be it a MAC address, IP address, sometimes L4 port information - and produces an output that is of constant length - either 3 or 8 bits. These are the "bit buckets" - the number of bits the input value is processed into.

As an example (though it is most probably not implemented this way), recall the MOD operation (modulo - remainder after an integer division). A simple hash function can be given as:

address MOD 8

Notice that the result of this function can be 0, 1, 2, ..., 7, and not more. So any input address will be processed into a value that is in the range of 0-7 and can hence be expressed using 3 bits, or 3 bit buckets. If you selected a pair of addresses to be load-balanced across, say, src-dst-ip, then usually a XOR operation between the source and destination address is first performed to produce a single value that "somehow" depends on both source and destination IP, and that address is then fed into the hash function, e.g.:

(sourceIP XOR destinationIP) MOD 8

If you performed an operation MOD 256 instead of MOD 8, you would get 8 bit results instead of 3 bit results.

All these are just examples - real hashing functions are more sophisticated to provide for good uniformity of the result.

The term "bit bucket" is probably referring to the fact that the hashing function is implemented in hardware for speed and efficiency reasons, meaning that the bits from the input address or addresses are fed into hardware-implemented arrays that perform bitwise operations on them and store the result into a selected number of bits - those bit buckets. However, even though I named, for example, the MOD operation here as a formal mathematical function, these switches - if they used the MOD operation - would have it implemented in hardware and they would be storing the result in a certain bit width, hence the number of bit buckets.

Please feel welcome to ask further!

Best regards,

Peter

rsimoni · ‎06-25-2013

Great explanation Peter!! cisco endorse fully deserved.

by the way you are right, this is the concept behind the hashing mechanism of EC load balancing.

Riccardo

Peter Paluch · ‎06-25-2013

Hi Riccardo,

Long time no read, man! Thank you! I am sincerely honored.

Best regards,

Peter

rsimoni · ‎06-25-2013

Hi Peter,

yes I have been pretty busy with other stuff lately...

I am glad to see you are still hanging around on the CSC.

take care

Riccardo

Joseph W. Doherty · ‎06-25-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

What I am struggling to understand is where the configurable load-balancing hash in the configuration then comes into play, i.e. src-dst mac, sr-dst ip etc etc.

Choice of load balancing algorithm, relative to your traffic, is critical for Etherchannel load balancing as it provides the source of the values that a final hash value will be computed from. (I.e. hash attribute selection is much, much more than just a "tweak".) Generally, the more "random" your source values are, the more "random" the final computed hash will be. Remember the final computed values will be used to select an Etherchannel link.

For example, if you use src-mac, only the src-mac will be used to select your path. Consider traffic being sent from a router's gateway interface to a switch using a Etherchannel. All frames being sent from the gateway interface will have the same src-mac. So, regardless of whether a 3 bit or 8 bit hash is computed, all traffic will use just one link (as the source of the hash is always the same) regardless how many links are in the Etherchannel.

Normally, given a limited set of hash attribute choices, such as src or dest mac, traffic being sent from the gateway would be normally be configured to use dest-mac. This assumes traffic is destined to different hosts from the gateway. Further, given the same limited options, traffic from the hosts to the gateway would normally be configured with src-mac. Again, we're assuming multiple hosts are sending traffic to the gateway.

To simplify our choice, if the device also supported src-dest-mac, we could use that for either direction. Yet if there was only one host, or one host accounted for 99% of the traffic, we effectively will be using just one link again.

To counter the limited MAC problem, such as of a single gateway and host, some devices offer other load balancing hash attribute choices such as using IPs rather than MACs. Again, though, if 99% of the traffic is between a pair of IP hosts, we won't obtain any load balancing benefit from Etherchannel.

High traffic between a pair of IP hosts might be distributed if the device's load balancing choices may include UDP/TCP ports.

As to the importance of 3 bit vs. 8 bit hash values, you're correct this aspect is how well links are balanced. Whether 3 bit or 8 bit, likely optimal load balancing will be achieved across powers of 2 number of links, but 8 bit will likely balance across 3 links better than 3 bit. 8 bit also works better when number of Etherchannel links exceeds 8.

For example, a 3 bit hash will have 8 values. On an Etherchannel of 3 links, we might map the first 3 values into 1st link (3/8 of the traffic), the next 3 values into the 2nd link (also 3/8 of the traffic) and the last 2 values into the 3rd link (2/8 or 1/4 of the traffic). Notice ideally we want 1/3 of the traffic across each of the 3 links, but we have 37.5%, 37.5% and 25%.

If for the same 3 links we use an 8 bit hash, with 256 values. We can map 86 values into the 1st link (86/256 of the traffic), and map 85 values into the 2nd and 3d links (85/256 of the traffic). Now our load balance would be 33.6%, 33.2% and 33.2% - much closer to our ideal of 1/3 each.

cbeswick · ‎06-27-2013

Hi Guys,

Peter / Joseph - many thanks for your responses.

So going back to how the member links are actually utilised within a port-channel. If we have 2 member ports we theoretically have the "potential" (I think this is key in my understanding here) to distribute traffic evenly across the links, but this is dependent on the configured / supported hashing algorythm on the switch. Using an example already discussed above, if we just use src-mac, and all traffic is passing between layer 3 devices with a layer two re-write taking place, then all traffic will be distributed across only one of the links. If we use a more granular configuration, say src-dst-ip, then traffic can be more evenly distributed across the member ports.

BUT - this is still very dependent on the types of flows and the algorythm being used. The number of links available in the port-channel, and the bits available for hashing the traffic across the links is an independent part of the "load balancing".

My next question then is:- hows does this work with a switch which has 3 links in the bundle, but uses something like a Sup720 which uses a 3-bit hash available for mapping traffic. If we use a very granular load balance hash in the config, say src-dst-ip-port, what is the actual consquence when the traffic is distrubted 37.5%, 37.5%, 25% ?

Does the 3rd link simply get underutilised ? Even if the first two get maxed out ?

What happens if one side of the port-channel terminates on a device using a Sup2T (which uses 8 bits for the hash) and the other connects to a Sup720 which uses 3 Bits ?

And finally, does the behaviour change at all when using a Layer 3 port-channel ?

Thanks again.

Joseph W. Doherty · ‎06-27-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

So going back to how the member links are actually utilised within a port-channel. If we have 2 member ports we theoretically have the "potential" (I think this is key in my understanding here) to distribute traffic evenly across the links, but this is dependent on the configured / supported hashing algorythm on the switch. Using an example already discussed above, if we just use src-mac, and all traffic is passing between layer 3 devices with a layer two re-write taking place, then all traffic will be distributed across only one of the links. If we use a more granular configuration, say src-dst-ip, then traffic can be more evenly distributed across the member ports.

Correct (with) . . .

BUT - this is still very dependent on the types of flows and the algorythm being used. The number of links available in the port-channel, and the bits available for hashing the traffic across the links is an independent part of the "load balancing".

Right. Again, an example of this is a pair of hosts exchanging (lots of) data. MACs, IPs and even UDP/TCP ports might remain the same, so all this traffic will take the same link. Distribution is based per "flow" and is deterministic. A "flow" always takes same link (unless number of links changes).

Also remember, Etherchannel doesn't analyze actual link loading so in a short time interval link loading can be, and often is, very unbalanced. Load stats usually balance closer to theoretical loading over longer time intervals.

Also because there's no distribution based on actual link loading, Etherchannel doesn't work as "well" as a single link of the same capacity. For example, if you have a dual Etherchannel, first flow will map to a link and cannot exceed that link's bandwidth, even though a second link is sitting there unused. If a new flow comes along, it now has a 50/50 chance of sharing the link already being used by the first flow or using the unused link.

My next question then is:- hows does this work with a switch which has 3 links in the bundle, but uses something like a Sup720 which uses a 3-bit hash available for mapping traffic. If we use a very granular load balance hash in the config, say src-dst-ip-port, what is the actual consquence when the traffic is distrubted 37.5%, 37.5%, 25% ?

Does the 3rd link simply get underutilised ? Even if the first two get maxed out ?

Yes, overtime, the 3rd link should be less utilized than the other two which should show about the same utilization. Again, especially in short term stats, load balances might be far off from expectations. For example, say every night you have some huge backup process that just happens to use link 3. In your daily stats, link 3 might show a higher overall utilization then either links 1 or 2, or perhaps even their combined utilization.

What happens if one side of the port-channel terminates on a device using a Sup2T (which uses 8 bits for the hash) and the other connects to a Sup720 which uses 3 Bits ?

Each side is independent of the other. (Remember how even on the same platform type you might use different algorithms for the hash on the two sides of the link.)

And finally, does the behaviour change at all when using a Layer 3 port-channel ?

Nope.

Iulian Vaideanu · ‎05-24-2016

This looks like the right thread to ask for further information on the subject - I know that one cannot control the physical port that a flow would use in a port-channel but, in case of non-power-of-two member count, is it at least possible to influence which physical port(s) would be under-utilized?

For example, "show interfaces port-channel <n> etherchannel" would output "no of bits" 3/3/2 for a 3-port channel - could one choose which of the three members gets the 2 bit buckets (maybe by taking members out of the port-channel and adding them again), or is this "hard-coded" (depending on slot / port number)?

[edit] Could the config bit below help? If so, how? I couldn't find it explained anywhere...

rc1(config-if)#channel-group 1 mode on link ?
<1-16> Channel group load-balancing link identifier

Peter Paluch · ‎05-24-2016

Hi Iulian,

I have not yet seen any way of influencing the load balancing behavior of EtherChannel bundles on Cisco switches, apart from the usual port-channel load-balance command. The hashing function used in Cisco switches for EtherChannel load balancing was never publicly described, and I have not seen any commands that would allow modifying the "weight" or "preference" of bundled ports with regards to traffic distribution patterns.

In fact, I haven't seen the link keyword with the channel-group mode on yet :) What exact IOS and platform version are you using?

Best regards,
Peter

Iulian Vaideanu · ‎05-24-2016

Hi Peter,

I've only recently (as in today) found the "link" option myself, while looking to see what else can be configured on etherchannels - it seems to be available for active and passive modes as well (7606-S, RSP720-3CXL-GE, 12.2(33)SRE9).

Etherchannel hash / bit-buckets and load balancing across member ports