cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2591
Views
6
Helpful
7
Replies

Nexus 9300 port-channel load balance strategy

Hello.

Please see attached image of uneven traffic across 4 port-channel links (2 links each, in a primary and secondary Nexus9300)

Nexus-4 port data history of po26.png

GIVEN CONFIG:

Nexus9300# sh port-channel load-balance
System config:
Non-IP: src-dst mac
IP: src-dst ip-l4port rotate 0

---

QUESTIONS: 

1. What does "rotate 0" mean?

2. Currently there exists only the default load balance config, with no QoS. Because the network is dropping packets to a new, and critical server, I'm strategizing to make these 2 core Datacenter Nexus' in redundant active-active configuration, use per-packet load balancing; meaning all packets will unilaterally load balance between the 4 paths from the server to the Nexus'. regardless of destination. Is this a wise strategy?

Thank you.

 

7 Replies 7

Christopher Hart
Cisco Employee
Cisco Employee

Hello!

The purpose of the rotate option used in the port-channel load-balance global configuration command is documented under the "Configuring ECMP Load Balancing" section of the "Configuring Port Channels" chapter of the Cisco Nexus 9000 Series NX-OS Interfaces Configuration Guide. Note that in the documentation, this option is defined for the ip load-sharing global configuration command and not the port-channel load-balance global configuration command - however, the rotate option is valid for the port-channel load-balance global configuration command too, and its meaning/definition is identical.

"The rotate option causes the hash algorithm to rotate the link picking selection so that it does not continually choose the same link across all nodes in the network. It does so by influencing the bit pattern for the hash algorithm. This option shifts the flow from one link to another and load balances the already load-balanced (polarized) traffic from the first ECMP level across multiple links.

If you specify a rotate value, the 64-bit stream is interpreted starting from that bit position in a cyclic rotation. The rotate range is from 1 to 63, and the default is 32."

This is an esoteric definition, so explaining the problem that the rotate option attempts to solve is probably the better route.

The rotate option is NX-OS's solution to the classic CEF polarization problem (Another good article that covers this scenario from a Cisco Nexus perspective is the Troubleshoot polarization in port-channel load balancing document.) This is where a multi-tier network topology with redundant active/active links (achieved either through Equal Cost Multipathing [ECMP] or port-channel load balancing via Multi-Chassis EtherChannel [MCEC] or Multi-Chassis Link Aggregation Groups [MC-LAG], either method will encounter the same problem) that has multiple network devices leveraging the same load balancing/hashing algorithm (which is likely if most network devices are the same model or the same series) that are subject to the same traffic flows.

Even though Nexus switches don't run CEF (Cisco Express Forwarding), they are subject to this "same hashing algorithm" issue. In fact, any networking device that leverages hashing algorithms from any vendor would encounter this same issue - this is not a Cisco-specific issue, this is an industry-wide issue!

Before tinkering with the rotate option, my advise would be to investigate the two following points:

  • We can see that the traffic on this port-channel/vPC is polarized, with the primary Nexus switch transmitting much more traffic than the second Nexus switch towards the relevant host. What I would investigate next is, where are the Nexus switches receiving this traffic from? Is the traffic originating from other hosts that are directly connected to these two Nexus switches, or is the traffic originating from hosts in remote parts of the network?
  • Next, I would investigate the profile of the traffic being sent to this relevant host. Specifically, I would try to identify whether there are one or more elephant flows that are polarized to the primary Nexus switch. If there's multiple elephant flows, then it may be worth tinkering with load balancing algorithms to try to distribute those elephant flows between the primary Nexus switch and the secondary Nexus switch. However, if there's a single elephant flow, tinkering with load balancing algorithms won't help you since hashing algorithms work on flows, not on individual packets.

You brought up per-packet load balancing/hashing in your post, so I want to stress the last sentence of my second bullet point. The overwhelming majority of network devices - including Cisco Nexus switches - only support flow-based load balancing/hashing; per-packet load balancing is not possible with these switches. I have a post on my personal blog that discusses how load balancing (again, either ECMP or port-channel load balancing, both are typically identical in behavior) works on most modern network devices that demonstrates this point in a bit more detail.

I hope this is helpful - thank you!

-Christopher

Hi Chris.

I expect a critical elephant flow is dropping packets.

How do I enable the port channel to not use load balancing, but to simply send each packet in a round robin order across each ethernet cable?

Hello!

There is no way to configure the switch to send each packet in a round-robin order across each member of a port-channel. Round-robin load balancing (which is one form of per-packet load balancing) is not supported on most network devices, including Cisco Nexus switches.

Thank you!

-Christopher

Thank you for your reply.

GIVEN CONFIG:

Nexus9300# sh port-channel load-balance
System config:
Non-IP: src-dst mac
IP: src-dst ip-l4port rotate 0

QUESTIONS:

1. Changing "rotate 0" has basically no impact in solving this uneven load situation?

2. If multiple SEPERATE tcp streams with SAME MAC and IP source and destination are flowing, then the existing config WILL load balance across many links?

2. If 1 elephant stream is dropping packets, then what is the best solution?

Thank you!

Hello!

Answers to your questions are below:

  1. I would not expect modifying the rotate option on the primary and secondary Nexus switches connected to the relevant host to have any impact on this issue. However, if the source of traffic towards the relevant host is transmitted to the primary and secondary Nexus switches through one or more upstream switches, then depending on the topology, it's possible that modifying the rotate option on those upstream switches may have an impact on this issue. This is assuming that we haven't isolated a single critical elephant flow that is causing this issue, of course - if we've isolated the issue to a single critical elephant flow, then the rotate option will almost certainly have no impact on this issue, regardless of where it's applied.
  2. If multiple separate TCP streams (meaning, separate source or destination TCP ports) with the same source/destination MAC address and source/destination IP address are flowing, then the existing configuration will load-balance across multiple links within the port-channel in 99% of scenarios. However, it's possible that one gets exceptionally unlucky with the source/destination TCP ports such that most of the traffic happens to hash across a single link within the port-channel. I've seen this before in a production network before, but it's an exceedingly rare scenario.
  3. If a single elephant stream is maximizing the link bandwidth of a single member of a port-channel, then unfortunately, there are only two solutions:
    1. Find some way to "diversify" the elephant stream such that it's split up amongst multiple separate streams. This may be possible depending on the specific application the elephant stream belongs to, as some applications are able to use multiple sockets on a single client machine to transfer data to a single server.
    2. Increase the bandwidth of each member of the port-channel. For example, if you're currently using one or more 10Gbps links to make up your port-channel, consider upgrading the port-channel members to use 25Gbps or 40Gbps links instead.

Thank you!

-Christopher

Hi Chris.

In this discussed situation, in which there exist 2 redundant Nexus 9300 switches, each with a port channel with 2 ports (2+2=4 ports, let's name them A and B on switch one, and Y and Z on switch two), when occur repeatedly closed and open connections, with same IP addresses and ports, does that mean ALL traffic traverses same single ethernet cable, or will it traverse 2 ethernet cables (example= ports A and Y)?

Let explain what I understand from @Christopher Hart  comment

If the traffic continuous then traffic always pass through one link of PO, 

If the traffic not continous then 

First traffic same source same destination same ports, NSK calculate hash and select link x from PO

Second same traffic, NSK calculate hash here trick

Rotate not config

NSK will select same link x of first traffic 

Rotate config 

NSK add number to hash result which in end make NSK select other than link x of PO, 

This way the same traffic always not use same link of PO. 

Review Cisco Networking for a $25 gift card