cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1547
Views
5
Helpful
10
Replies

Load Balcing with One L3 switch & two ISP

SHAFI Manjeri
Level 1
Level 1

Dear Team,

As showing below , I have one L3 acting switch and two managed Router running with OPF as internal and MPLS in external. Both routers are connected to deffrent ISP MPLS ( both ISP are connecting same corp network via MPLS ). How do I load balance ?

I need  eaqually share the load with both routers .

SHAFIManjeri_0-1671971918250.png

 

10 Replies 10

you have one Core SW so you need PBR 
first make PC=0 have subnet x.x.x.x
make PC=1 and PC=2 have subnet y.y.y.y

use PBR 
ip access-list extended 100
permit ip x.x.x.x any 
!
ip access-list extended 110
permit ip yy.y.y any 

then 
route-map MHM permit 10 
match ip add 100
set ip next-hop <ISP1>
route-map MHM permit 20
match ip add 110
set ip next-hop <ISP2>

this what you need.

Joseph W. Doherty
Hall of Fame
Hall of Fame

For outbound, if your L3 switch and your two routers are running OSPF, it may be as simple as insuring the L3 switch "sees" all destinations, through your two routers as "equal" OSPF cost, and then OSPF will do ECMP.

For inbound, you would need to explain the "other side's" topology.

I believe that the key requirement is this "I need eaqually share the load with both routers".  The solution suggested by @MHM Cisco World with one host in x.x.x.x and 2 hosts in y.y.y.y is highly unlikely to achieve equal load on both routers. The solution suggested by @Joseph W. Doherty comes closer but I believe that ultimately it will not achieve equal load on both routers. The first challenge would be to be sure that both routers advertised exactly the same routes. You could achieve this if both routers advertise a default route and not any other prefixes. The second challenge is that even if the L3 switch has 2 equal default routes that it balances  traffic per flow. It is likely that some flows will have more traffic and some flows less traffic so equal load to router will not be achieved.

Perhaps tis discussion is about semantics and the difference between load share and load balance. Cisco focuses on achieving load sharing and not on equal load balancing.

HTH

Rick

"The solution suggested by @Joseph W. Doherty comes closer but I believe that ultimately it will not achieve equal load on both routers."

Very likely true, but generally only packet-by-packet, MLPPP, etc. solutions truly obtain almost exactly equal load balancing.  ECMP, often, but not always, over a longer time, does come close to equal load balancing.

"The first challenge would be to be sure that both routers advertised exactly the same routes. You could achieve this if both routers advertise a default route and not any other prefixes."

Fully agree both routers need to have exactly the same routes.  I would assume this is the case since OP notes "both ISP are connecting same corp network via MPLS".  Of course, this might not be the case, but having in the past supported a world wide network using different MPLS SPs, with each site always being connected to two of them, for redundancy, we advertised the same prefixes from each site.

Unclear, though, why you believe you must only use a default route and not the corp prefixes to achieve ECMP.

"The second challenge is that even if the L3 switch has 2 equal default routes that it balances  traffic per flow. It is likely that some flows will have more traffic and some flows less traffic so equal load to router will not be achieved."

Again, this is true, but often, but not always, more of a short term balancing issue.

If you really want to achieve the best load balancing, in this topology, you might need to look into using something like Cisco's PfR.  PfR can dynamically load balance flows.  It has some other features that are very, very nice in a multi-MPLS environment, addressing issues like MPLS vendor black holes and/or brown outs in part of their network.

I didn't mention PfR before, as its apparent complexity causes many to shy away from it (plus there's the extra cost of its licensing).

Joseph you and MHM have presented approaches that work toward the goal described in the original post. my focus was that the requirement of "eaqually share the load" is not a realistic requirement. Other than MLPPP I am not aware of any Cisco technology that can really deliver equal load over multiple paths.

If the original poster was using terminology loosely and perhaps really was thinking load sharing, then your suggestion would be good enough. But if the requirement is really load balancing then we will not be able to achieve it.

HTH

Rick

Rick, I agree with you 100%.  If OP really must achieve "[sic] eaqually share the load with both routers", 100% of the time, it cannot be done with the described topology.

However, if the OP desires to try obtain, something, somewhat, like load balancing, ECMP might be good enough.  Or, even better than ECMP, is PfRs dynamic load balancing (which often balances much better than ECMP, but still cannot guarantee 100% equal load balancing all the time [as it balances flows]).  Even though PfR cannot offer 100% load balancing, it can offer something, possibly even better, best end-to-end performance, per flow.

"Other than MLPPP I am not aware of any Cisco technology that can really deliver equal load over multiple paths."

Well, as mentioned in my prior posting, packet-by-packet, but NOT recommended (unless you want to start modifying all your hosts TCP fast retransmit triggers, etc., and even so, I would advise against doing that).

Or, years ago, when using ATM links, Cisco provided ATM IMUX cards, which was the sort of an ATM cell MLPPP version.  Of course, not applicable here, just mentioned it as it was (still?) a Cisco technology that did link load balancing.  I mention this, as for those seeking link load balancing, there might be other (not ATM related) similar 3rd party technology that would work with standard Ethernet frames/packets.

All-in-all, to OP, again, Rick is correct, if you truly want 100% of or all the time "real" load balancing, you're not going to be able to do so.  However, if you have not considered ECMP, or have not tried it, give it a go.  Again, it might be "good enough".

Joseph, Glad to know that we agree. Yes you did mention packet by packet load share (and advised against it). I have 2 things to add to that part of the discussion:

1) If you specify packet by packet load share you should get an equal number of packets on the link. But that does not necessarily mean that it produces equal amount of traffic on the link. What happens if link 1 has 10 packets, each of which has 500 bytes and link2 has 10 packets, each of which has 1,500 bytes? Certainly not equal load on each link.

2) I would like to describe my experience with a customer about this. The customer had 2 data centers with 2 links connecting the data centers. The main data center had a server running a critical app. Each night the server would archive all of its data to a server at the other data center. The archive process run time was at 4 hours and increasing. The customer wanted to shorten the run time, and they were aware that of the 2 links between data centers at night one link was heavily use and the other link was very lightly used (no surprise since forwarding was flow based). They decided that they wanted to implement packet by packet load share. My opinion was negative, but they were the customer and so we did it their way. They made the change and that night the run time for the archive process was 7 hours. The next night the run time was 7.5 hours (so the first night was not a fluke). As we analyzed what was happening we realized that packet by packet share was generating out of order packets. The process on the server reacted to out of order by dropping packets and requesting retransmission. It took longer, and generated more traffic on both links. So be careful what you wish for if you wish for packet by packet sharing.

HTH

Rick

Rick, regarding your first point, what you describe is possible, but unusual.  Why?  Because it posits a single flow like 1500 - 500 -1500 - 500 . . .

Actually, I have seen flows like that, well actually more like 1000 - 500 - 1000 - 500 . . ., due to 1500 byte packets being fragmented.

BTW, I recall (???), MLPPP can run into a somewhat similar situations.  Say you have a flow (also unusual, even more so) like 1500 - 500 - pause (long enough that the 1500 transmission has finished) - 1500 - 500 - pause . . . this may create the same situation.  (Further, if you start mucking about with MLPPP's LFI options, like fragmentation size, in some corner cases, you might unbalance the links even when all your packets are 1500.)

Regarding your second point, yup, that's not unexpected  (Also why doing this is NOT recommended).

As mentioned in my prior post, that could have likely been mitigated by adjusting the TCP fast retransmit parameter on the sending host, but again, you would need to make that adjustment on every host using those links (usually not practical).

But, laugh, although in your described case doing packet-by-packet, actually worsen the effective transmission rate, I presume, links were nicely load balanced?

Joseph, I agree that my example is probably unusual. I chose it to emphasize a point. In evaluating possible solutions where the requirement is equal load on 2 links I suggest that we need to emphasize what is possible rather than what is usual. We both know that Murphy's law applies and if something can go wrong then it will go wrong, at an importune time. And I observe that for your suggested approach to work it requires that every packet to be exactly the same size, which is also unusual. You are looking at the original post through the lens of what is the best solution we ca provide (and I usually share that perspective). I am looking at the original post through the lens of does it satisfy the requirements.

And yes in the experience I described the links were fairly well (but not exactly) well balanced.

HTH

Rick

Rick, I disagree with your, to me, (over) emphasis on the "unusual".  Why?

Have you ever flown commercially?  If so, you know, with Muphy's law, "bad" things can happen when flying, leading to your death!!!

Don't misunderstand.  I think it good that you brought up, something like ECMP (or PfR too), will not guarantee 100% load balancing.  But, (what seems to me) an (over) emphasis of some low probability corner cases might imply that trying ECMP (or PfR) are not even worth doing if your "requirement" is 100% load balancing.  Likely, this isn't your intent, but when you highlight cases like your 1500 packets on one link while 500 packets on the other, which (agreed) will have routers unbalanced, without also noting (as you now agree) such a case is likely unusual, might imply something like ECMP isn't worth using at all.  (Again, I suspect, that's not your intent.)

As you're often note in many of your replies, there's much we don't know, including whether OP has or currently uses ECMP or PfR and/or just how much a "requirement" is true load balanancing.

Let's first assume, neither ECMP or PfR are being used, i.e. all traffic is just using one egress router.

If we enable ECMP (also assuming it's possible), assuming there's more than one flow, there's likely to be some dual egress router usage.  Even for your example of 1500 one one link while 500 on the other, we've now achieved a load share of 3:1.  Not an ideal 1:1, but better than 1:0.

Usually with more flows, ECMP often achieves a load share approaching 1:1, especially over longer measured time intervals.  (Again, as you've correctly noted, this is NOT guaranteed.)

Let's next assume, we have one bandwidth hog flow (like your real world case), running concurrently with other non-bandwidth-hog flows.

With ECMP, it's possible, but unusual (in most cases), that all the flows will be directed to just one path leaving the other with zero traffic.  It's also possible, but also unusual, that the bandwidth hog flow will obtain exclusive use of one path while all the other flows using the other path.

Add PfR to the mix, and the latter is no longer unusual, but the expected result!

Move to two bandwidth hog flows, with other concurrent flows, and with ECMP, basically it's a similar situation as in the one bandwidth hog with concurrent flows.

Add PfR and now you should see one bandwidth hog flow per egress path, and all the other concurrently flows, split as much as possible, based on bandwidth demand, across the two paths.

I.e. with PfR you should achieve the best bandwidth balancing possible, while keeping flows per path.  In the latter dual bandwidth hog situation, you should achieve 1:1 load balancing, as the two bandwidth hog flows, alone, will saturate the two egress paths.

Again, with PfR, although it doesn't guarantee 100% balancing, all the time, it should guarantee the best possible load sharing of your multiple paths (again based on balancing flows).

Lastly, in a real world situation like Rick described (BTW, Rick, what was your final solution, if any?), there are approaches to deal with moving a single "data set" across multiple paths.

For example, several times mention has been made of using packet-by-packet distribution, along with why it's generally NOT recommended.  As also mentioned, the out-of-order retransmit problem can be mitigated by adjusting the dup ACK counter, which can be done per TCP flow.  I.e. if you devote a router doing packet-by-packet for this one flow, you can use your multiple links for that flow.

Would such a solution be the best possible?  That's an it depends answer.  Again, there are other approaches, including obtaining more bandwidth (perhaps on just one egress router - BTW, PfR can proportionally load share too) and/or using WAN accelerators, etc.