HSRP failover times

g.peart · ‎08-25-2011

Hi all,

I have been testing an HSRP setup using HSRP v1 and have been wondering

why it takes so long to switch back to the original active router after it has recovered from a

failure.

The timings I have when using the defaults, is a loss of packet forwarding for 28secs

when moving to the Standby router, even though the routing protocol has converged

and when the original active router is restored, packet forwarding is loss for 50secs.

I've include a topology map and the standby debug.

Packet forwarding doesn't happen until the Active router is found why ?

ideas and views welcome.

TIA

Florin Barhala · ‎08-26-2011

HSRP is pretty old protocol now and not much used anymore. For "today's network standards" I suggest you tweak its timers to smaller values.

Peter Paluch · ‎08-26-2011

Florin,

HSRP is pretty old protocol now and not much used anymore. For "today's  network standards" I suggest you tweak its timers to smaller values.

Whoa! This is quite a strong assumption, considering the fact that HSRP was, for a long long long time, the only FHRP protocol supported on lower-end Cisco multilayer switches, and is a routine part of many Cisco design documents... and of all current relevant certification exams, too. Not much used? I absolutely disagree with that assumption. By no means am I a representative person to say this on a global level, but at least personally, I am encountering the HSRP all the time.

In fact, there are not many choices in the FHRP protocol field. You have the "open" VRRP which Cisco claims that it infringes its patents on HSRP (and only supported on 3560 and higher since 12.2(58)SE), you have the HSRP universally supported across Cisco product platforms, and of course, GLBP which is supported only on Cat4500 and higher. The 'ucarp' approach from BSD is kind of specific, and given the fact that Cisco patented the idea of the tuple that allows seamless transition from one router to another, I do not believe any other protocol can come up with anything comparably functional and yet not infringe Cisco's patents.

Old or not - the question is whether it is up to today's needs. Note that often, a protocol is deemed "old" just because it rects slowly. However, that is a conceptual problem: the Hello protocol so often used with most today's protocols is first and foremost intended to convey configuration data and localize neighbors. It is not so well designed to rapidly detect a loss of neighbor - and it should not be. Neighbor loss detection is a specific requirement for which a separate, lightweight protocol can be used, and recently, such protocol has indeed been introduced: the BFD. It is now a matter of integrating the existing protocols with BFD detection (making them BFD clients) to rapidly react to a neighbor loss, with low CPU demands. Hence, combining HSRP with BFD can provide exceptionally fast convergence without tweaking HSRP's timers themselves.

Best regards,

Peter

g.peart · ‎08-26-2011

I will be testing the setup using msec timers, but just was suprised to see it

take so long to failover and then even longer when failing back, when using the

defaults. So my main point: is this normal behaviour ?

Jon Marshall · ‎08-26-2011

No 28 secs is not normal.

By default HSRP hellos are sent every 3 seconds. If a hello has not been received for 10 seconds by the standby then it becomes active.

50 seconds to resume sounds awfully like an STP (802.1d) timer being involved here.

Jon

Jon Marshall · ‎08-26-2011

Florin

Have to agree with Peter on this. That is an incredibly general statement, do you have any evidence to back it up ?

Jon

A Abdul · ‎08-26-2011

HSRP may be old but its still in play. As said by many already, its the most FHRP solution used. About the failover, preempt and use msec timers.

glen.grant · ‎08-26-2011

HSRP is all over our big corporate network and it works fine . Quite frankly whats the issue whether it falls back over in 5 or 40 seconds as long as its still has a working path...

Jon Marshall · ‎08-26-2011

Glen

Think that's the point. During the 28 and 50 secs there is no path available ie. no packet forwarding

Jon

g.peart · ‎08-26-2011

I will do more debugging output for the STP,HSRP and EIGRP

to see the interactions and post, but will have to be later.

thanks

Jonathancert_2 · ‎08-26-2011

You have a long failover time. Interested in what you find out. I had a similar problem while using HSRP on routers. HSRP worked fine doing a power failure but longer delay if the interface just went down. I love HSRP, especially on layer three switches.

Jonathan,

g.peart · ‎08-26-2011

In the end I upgraded from c3550-ipservicesk9-mz.122.46.SE to 122.55

and it all started to work, I include m debug standby outputs and some packet captures

for reference, use notepad to open, UTF-8 encoded.

thanks