Re: Problems with load balancing and IP Route Cache

cbeswick · ‎08-08-2005

Hi,

Imagine if you will a head office and a remote site. Interconnecting the two sites are 2 WAN link's provided by 2 independent ISP's. Each running BGP on there internal networks. At each site there is a 3550 Layer 3 switch. This is acting as the access router, interconnecting the 2 ISP's and providing "fan out" connectivity to the local LAN at each site.

Each site uses OSPF for local routing. OSPF - BGP Redistribution into each ISP subnet is takin place at each provider CE router.

So basically what we have is this :

Subnets at the remote site are being advertised up to the head office over equal cost paths to each ISP CE router, Subnets at the head office are being redistributed to the remote site over equal cost paths to each ISP CE router.

Looking at the routing tables on each 3550 at each site I can see 2 paths to subnets at each site, via each provider CE. I would think that this would be enough for load balancing to take place, such that if one circuit fails, no network connectivity will be lost and which ever circuit remains active will simply forward all traffic between the 2 sites.

However, when simulating such a failure, it took 30 seconds until traffic successfully flowed over the remaining WAN circuit, which funnily enough is the same amount of time the OSPF dead timer takes to age out routes from the failed link.

Speaking to a techy from one of the ISPs he recommends turning off "ip route cache" for the VLANs advertising / learning subnets over each ISP circuit. He says that a cache flow is being created for one of the links and this is being prefered all the time. The recommendation therefore is to turn off ip route caching so that each packet is analysed one by one, so that should a link failure occur packets will instantly take the diverse route provided by the other ISP.

This is where I get confused, because looking in the ip cef tables (i.e the route cache) routes to subnets at the remotes sites are being learned over each ISP circuit.

Is my techy correct in his solution ? Has anybody else come across similar problems ?

Sorry for the long text. Any help would be greatly appreciated.

Chris.

v_milenko · ‎08-09-2005

Hi Chris!

When you simulate failure what happen with routing table. It change immediatly - route trough failed WAN link disapear immediatly or after a time.

Victor Milenko.

cbeswick · ‎08-09-2005

It disappeared after 30 secs, after the OSPF dead timer expired.

I have just had a conversation with my technical friend from the ISP. He assures me that disabling ip route cache will solve the problem. As the flow is being created by mac address, not ip destination.

i.e

When a mac address enters the layer 3 switch its destination is looked up and then stored as a flow. All subsequent packets wanting to get to the same destination use the same flow, so load sharing isnt being taking advantage off. If I turn off route caching, process switching is forced, which means that every individual packet is processed and an individual lookup is carried out. So that in the event of a link failure, it should load balance more effectively.

Whether this works or not, I do not know. I am going to test it on Wednesday night.

ruwhite · ‎08-10-2005

CEF isn't a route cache.... So turning off CEF isn't going to help anything. CEF is a forwarding table, built in real time from the information in the routing table. Even in the old fast and optimal cache situations, removal of a route would cause the cache entry related to that route to be removed immediately, not after some period of time.

The problem here is the length of time it takes for the routers to learn one of the two paths is gone, and to switch to the other path, or rather remove the second path from the table. You say these paths are both learned over an MPLS VPN--how is this set up? Is it a PE/CE link using OSPF sham links, or just RFC2547 BGP extended communities to push the OSPF metrics across the MPLS cloud? Or are you actually forming an OSPF adjacency through an MPLS tunnel? OR are you just redistributing BGP into OSPF at the PE or the CE?

The answer to these questions is going to determine what the next step is to get your convergence times down.

:-)

Russ.W

cbeswick · ‎08-10-2005

Hi all,

Russ :- As far as I am aware, OSPF is being redistributed into each providers BGP networks at each provider CE.

All statics and OSPF learned routes are being learned back across the site as OSPF eternal type 2's. That is, all routes learnt by the remote site are type 2's and all the routes learnt by the head office from the remot site are type 2's, including some static that needs to be redistributed.

Hope this helps.

Thanks all for your responses and help.

Chris.

ruwhite · ‎08-11-2005

If there's redistribution, the problem is most likely in one of two places:

-- The amount of time it takes the redistributed routes to "filter through," and be removed from the local OSPF tables. This would include OSPF timing and processing itself.

-- The amount of time it's taking the BGP routes in the SP's network to converge.

Since the SP's network is all iBGP at this point, I'm not certain that's going to be a problem. I think it shouldn't take any more than 1 second to effect a redistribution--maybe 1.5 seconds would be the outside number I would expect.

Your last message said you were going to focus on OSPF next, and that's where I think the problem most likely is, as well. The next question is: Are you waiting on an OSPF adjacency to time out in order to see the route fail? If so, then faster OSPF hellos might be an option for you (?).

Anyway, happy hunting in your next steps, and post questions or doubts if you run across any.

:-)

Russ.W

joyride_us · ‎08-10-2005

Hi,

the CCIE will provide you with the real solution. But not being CCIE, I have mine! Cheaper but they work! :)

Basically you want to load-share your traffic to the remote site through 2 ISP's (well all in all through 2 different paths).

What you could do is : do not redistribute OSPF in BGP : the primary path can be routed with a static route and the alternative path with the default route (hopefully the default route is "available", if not try the brand new "ip route xxxxx track 1" feature).

I have already used this technic : redundancy time : 1 second. Getting it perfect (seamless) will take time and might never work. And 1 second is usually fine even for TCP traffic.(the static route becomes inactive upon disruption because it is a serial interface and it will be down, and the router switches at once to the default route). I am using it.: good stuff and quick implementation!

Otherwise, I am not sure but I would say that your "load-balancing" takes time because it load-balances per -flow (sure that the costs are equal?), not per packet. Per flow is default. Of course if you cut a connection, a per-flow load-balancing will not help and convergence will occur with traffic disruption. Try to configure per-packet load-balancing(with CEF) and try it if you wish.

I am not sure if I am helping there, but I would for sure try my "special" first!

Good luck!

aravindhs · ‎08-10-2005

Hi Chris,

CEF by default does per-destination load-balancing. Hence, this is on a per-flow basis, as you said.

You should just enable per-packet load balancing on the outgoing (WAN) interfaces.

Router(config)int serial 1 (ISP1)

Router(config-if)# ip load-sharing per-packet

Router(config)int serial 2 (ISP2)

Router(config-if)# ip load-sharing per-packet

If you disable route-cache, CEF will be disabled and hence it does not help the cause.

So you might wanna try doing this simple stuff which will ensure the flow is per-packet based rather than per-destination based.

HTH

Do let us know what your result.

Cheers

Arav

cbeswick · ‎08-11-2005

Hi all,

We have determined that the problem is more to do with the OSPF convergence at the head office. We are carrying out a change on Monday night (15th August) to bring down the hello timers and the dead interval timers to speed up convergence.

I will let you know the outcome.

Chris.