Solved: Re: EIGRP Unequal Load Balance

Othacon · ‎01-31-2024

Hi everyone,

hope you can help with EIGRP unequal load balancing in a Catalyst 3650.

I have a system where I need to send the traffic to reach the 10.1.40.0/24 via the vlans 4084 and vlan 4087. Due to the metric, the vlan 4084 as the lowest FD and vlan 4087 is not installed in the routing table, not appearing even as a feasible successor. Tried to use the variance command but since the route doesn't appear in the EIGRP topology it didn't work. But if I do "show ip eigrp topology secondary-paths" the route appears there as per below:

Neighbours

Show ip eigrp topology

show ip eigrp topology secondary-paths

I need to force the traffic to go via vlan 4084 and vlan 4087 alike in order to load balance the links, since I have one link that is being overloaded. Please could you help me with this?

Thank you

David Ruess · ‎01-31-2024

Hello,

The rule for EIGRP Unequal cost Muiltipath and for a route to be considered as another link it must meet the feasibility condition. This is part of its loop free decision to protect you.

The Reported Distance of the route mut be less than the Feasible Distance of the current successor. In your example the "secondary" link is exactly the same metric as your FD. You would need to artificially inflate the metric of the successor path or decrease the metric of the secondary path to where the reported distance from the upstream neighbor is less than the FD of the sucessor route.

After that the variance command should work

Hope that helps

-David

View solution in original post

David Ruess · ‎01-31-2024

Hello,

The rule for EIGRP Unequal cost Muiltipath and for a route to be considered as another link it must meet the feasibility condition. This is part of its loop free decision to protect you.

The Reported Distance of the route mut be less than the Feasible Distance of the current successor. In your example the "secondary" link is exactly the same metric as your FD. You would need to artificially inflate the metric of the successor path or decrease the metric of the secondary path to where the reported distance from the upstream neighbor is less than the FD of the sucessor route.

After that the variance command should work

Hope that helps

-David

M02@rt37 · ‎01-31-2024

Hello @Othacon

If the RD is greater than or equal to the FD of the current successor, EIGRP doesn't consider it as a feasible successor, and as a result, the variance command won't have the desired effect. In your case RD=FD=3328.

To enable unequal cost load balancing in such cases, you can indeed manipulate the metrics to meet the feasibility condition. You can artificially inflate the metric of the successor path or decrease the metric of the secondary path. This manipulation can be achieved using the metric command under the EIGRP configuration.

After making these adjustments, check the EIGRP topology table to ensure that the secondary path is now considered a feasible successor. Once you have feasible successors, you can use the variance command to achieve unequal cost load balancing.

Best regards
.ı|ı.ı|ı. If This Helps, Please Rate .ı|ı.ı|ı.

Georg Pauwen · ‎01-31-2024

Hello,

try to change the EIGRP metrics on the Vlan 4087 interface. Post the output of:

show ip eigrp interface vlan 4084

show ip eigrp interface vlan 4087

You could use the template below and use the same values for both interfaces:

ip bandwidth-percent eigrp <AS-number> <percentage>
ip delay eigrp <AS-number> <delay>
ip reliability eigrp <AS-number> <reliability>
ip mtu eigrp <AS-number> <mtu>
ip load-sharing eigrp <AS-number> <load-sharing-method>

Othacon · ‎02-02-2024

Thank you Georg, just executed the show commands and I get this:

From the tempelate below, the only thing really that the switch allows me to add is "ip bandwidth-percent eigrp". The rest doesn't allow me to add to the interface.
I can add the delay and that's about it.

Joseph W. Doherty · ‎02-02-2024

"I can add the delay and that's about it."

I recall that's the normal way to tweak EIGRP path selection.

Joseph W. Doherty · ‎01-31-2024

The other posters have provided why variance isn't working.

Just a few things to keep in mind . . .

I believe unequal LB tries to distribute flows, proportionally. If so, is that what you want?

You can also set the metrics to equally distribute flows.

Remember, flow distribution, I recall, by default, doesn't take into actual path loading. I.e. you may only achieve about an effective 50% increase in usable bandwidth. Will that solve your over loading issue? Even if it does, what happens if a path fails?

For link overloading, if not already using, you might consider using QoS.

Othacon · ‎02-01-2024

Thank you all for your replies

I will try to change the metric and maybe in the end not even use unequal load balance. I was think about the situation, and maybe in this occasion is best to send the traffic via vlan 4087 only and leave vlan 4084 as a fall back. A fall back that will not have much effect but still better than anything.

@Joseph W. Doherty i indeed used QoS but to no avail, it's simply too much data / spiky data. This system was one that I inherited and it was really poorly designed. All the data is going to a central point from several remote and it's all funilating. This is the rough design of the system:

The System is 2Gbps using LAG then where the traffic flows to they've put 1Gbps between ASW to Core and the link on the other side they've put as well 1Gbps. Already told that this needs to be changed, but in the meantime need to find a temporary solution while keeping at the same time all the links active.

MHM Cisco World · ‎02-01-2024

You need traffic load between which points? Can you show that in topolgy

MHM

Othacon · ‎02-01-2024

@MHM Cisco World , as it stands all the blue traffic from the "loop" is being funneled to the purple arrow, reaching the ASW and going to the CSW via a 1Gbps Uplink. I need the traffic instead of just going all to CSW2, for CSW 1 to send the traffic to CSW 2 and CSW3 alike. I was thinking maybe even the best option would be to "remove" vlan 4084 from the routing table and to force EIGRP to put vlan 4087 instead. In case the link of vlan 4087 failed, the traffic would flow via vlan 4084. It would still create issues, but at least it would be something reaching the receiving point.

MHM Cisco World · ‎02-01-2024

1 Gbps is not issue here

In end CSW1 have path

2 + 2 gbps

2 + 2 + 1(or change it to 2)

So CSW1 always prefer 2 + 2 i.e. via csw2

So as other mention you need unequal multi path otherwise you never get load balance between two link.

And also you must notice that with unequal you can have asymmetric routing if you have FW this asymmetric can drop your packet.

MHM

Othacon · ‎02-01-2024

Indeed @MHM Cisco World , the issue I have is that when the data from CSW 2 reaches the receiving point it doesn't enter straight in another core, it enters in an ASW at 2Gbps, but then all the traffic is being funneled to the main CSW at 1Gbps. This ASW is not being able to deal with the traffic and starts dropping in the output queue.

The traffic that flows via CSW 3 on the other hand, despite being 1Gbps only as well, ends up straight in the main core switch that distributes it trough the receivers.

The 1Gbps link between the ASW and the main core is being my main bottleneck in this case.

MHM Cisco World · ‎02-01-2024

The 1Gbps link between the ASW and the main core is being my main bottleneck in this case.

Are you sure bottelneck cause drop in AWS or asymmetric traffic drop traffic.

Check the traffic rate in AWS to see if it above 1 gbps.

MHM

Joseph W. Doherty · ‎02-01-2024

@Othacon wrote:

@Joseph W. Doherty i indeed used QoS but to no avail, it's simply too much data / spiky data.

Well, although you're using QoS, as almost any QoS text book, IMO, doesn't teach on its subtle points, I suspect you might not be getting the most out of your devices' QoS capabilites.

Dealing with "spiky data", to me, is a QoS issue. Often if its microbursts, just increasing queue limits can sometimes dramatically decrease packet drops. Which, BTW, on Catalyst 2K and 3K switches, their buffer defaults often cause premature drops for such traffic, so much so, for years now, Cisco, itself, often recommends increasing the logical queue depth limits as the 1st thing to try well dealing with port drops on these switches. Sustained congestion, though, requires a wholly different approach.

Now "seeing" your topology, and knowing you're using LAGs, it's very easy to bump into congestion issues.

Also reading your subsequence posts, you've alleviated your issue by shutting down a link?

If so, firstly, tweaking routing metrics should allow you to accomplish the same without the need to physically shutdown links. Secondly, surprised, shutting the CSW1<>CSW2 would have such a dramatic improvement impact, unless you've just relocated port drops to another switch and you haven't noticed that yet(?).

BTW, could you clarify where you're routing and where you're just passing L2 between devices?

Othacon · ‎02-02-2024

Well, after reading you post @Joseph W. Doherty , remembered to check if the switches had configured the "qos queue-softmax-multiplier" config. I've applied my policy map to the interface but assumed that this option was already enabled... it wasn't. Have now configured this and I'm forwarding now the packets to that link without packet drops.

Honestly, totally forgot about this option... still, in my opinion this system can't stay like this. We can't have 2Gbps ending up on 1Gbps links. The customer is also adding more devices on the remote ends and this is increasing the bandwidth in the links. Since this devices are all producing video traffic, the switches will need to handle more bursty data, and going from a "pipe" of 2Gbps to 1Gbps, its truly not the best...

Secondly, surprised, shutting the CSW1<>CSW2 would have such a dramatic improvement impact, unless you've just relocated port drops to another switch and you haven't noticed that yet(?). - it did due to the traffic haven got divided in Half in essence instead of going all to CSW 2, half of traffic now went to CSW 3 and another half went CSW2. This alleviated the bandwidth in the 1Gbps link between the ASW and CSW in the receiving point.

The traffic that comes from CSW3 enters directly on the CSW in the receiving point. The traffic that comes from CSW 2, enters first on the ASW, then goes to the CSW in the receiving point.

The entire infrastructure is comprised of L3 devices routing between each other using transit vlans (vlan 4084 and 4087, are some of those transit vlans). The only device acting as pure L2 is the ASW in the Receiving Point that receives the data from CSW2. This ASW then funnels all the data down the 1Gbps link to the main core in the Receiving Point. Also this ASW as other devices connected to it, increasing even more the load on the 1Gbps pipe.