06-25-2013 07:46 PM - edited 03-04-2019 08:18 PM
Hi.
In a design with a hub router with ~100 GRE/IPSEC tunnels (and still growing), if we want to achieve high availability/fast convergence avoiding the cpu/memory overutilization, what could be better to fine tune? the GRE keepalive timers, or the routing protocol timer? Is there a best practice for this? and what could be the recommended value?
The routing protocol is EBGP running over the GRE/IPSEC tunel and the Hub routers is ASR1001.
Thanks,
Carlos.
06-25-2013 10:18 PM
Carlos
I faced a somewhat similar question with a customer who is running a pretty large hub and spoke network (with a bit more than 400 spokes). Differences in our network include the fact that there are two hub routers and each spoke has a tunnel to each hub, and the fact that our routing protocol was EIGRP and not EBGP. We came to the conclusion that it was better to depend on the routing protocol for detecting failure and converging and not to depend on the tunel keepalives. In fact we came to the conclusion that with the routing protocol to detect failures that there was little benefit in running GRE keepalives. So we did not enable this feature.
I recognize that EBGP will not converge as quickly as EIGRP. But I believe that you would benefit more from tuning the routing protocol than you would from tuning the GRE keepalives.
HTH
Rick
06-26-2013 03:18 AM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
I agree with Rick, I would suggest tuning your IGP for faster convergence (such tuning might include other than lost of neighbor detection).
However you might still want to you GRE keepalives, as often it will take a tunnel interface "down", which depending on your network monitoring (if any) might be easier for such monitoring to detect a path outage than loss of a IGP path. I.e. tune the IGP for fast convergence, and perhaps retain GRE keepalives for logical path failure monitoring.
06-26-2013 01:56 PM
Richard/Joseph,
Our design is also dual hub and each spoke has a tunel to each hub. As this is a service provider entry point to an mpls network (for internet routers), it is possible that it may grow to 400 spokes (almost same as your topology), or even more..
Due to the mixing vendors at each spoke and the possibility of large number of prefixes, we decided to use EBGP instead of an IGP (standard or propietary).
We want to achieve fast convergence, but not overwhelming router resources (memory/cpu) and avoid routing flap instabilitys due to agressive timers.
Questions:
1. If we decide to fine tune the EBGP timers, what could be the best values to start with? hello, dead interval? keeping in mind the large number of peerings (actually and in the future).
2. Of both solutions (use tuned gre keepalives, or tuned bgp timers) which is lighter (less cpu/memory cycles) in the hub router? fast gre keepalives or more agressive hello dead bgp timers?
Thanks,
Carlos.
06-26-2013 05:10 PM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
If I recall correctly, basic eBGP timers don't lend themselves for very fast detection of lost peers (eBGP is more suited for using downing of physical link as a trigger - of course n/a for a tunnel).
If supported, BFD might be the best "soft" way to detect neighbor loss.
You might try reducing GRE keepalives, incrementally, while watching CPU loading. I've often used 1 second keepalives across GRE tunnels, but for less than a 100 tunnels on a hub device.
06-29-2013 12:19 PM
Carlos
Joseph makes an interesting point about side benefits of using GRE keepalive in taking the tunnel line protocol down. This could trigger alerts which could be useful if you need network control staff to take some manual action to repair the connection. Since the original question was about speeding up convergence I had not considered the operational control potential as a point in favor of GRE keepalive. But it is an interesting point to consider in choosing which alternative is best.
One thought that occurs to me is that taking down the tunnel line protocol would be very effective where the routing protocol is an IGP like OSPF or EIGRP which assumes that routing peers are on a connected subnet and if the interface goes down then they immediately take down the peer relationship. But in BGP which does not necessarily assume that peers are on a connected subnet then I wonder if GRE keepalives would be particularly effective in taking down the peer/neighbor relationship.
I am also of the opinion that tuning the BGP timer would probably be less overhead on the router than using GRE keepalive and tuning the keepalive timers. The BGP packets used to maintain the BGP neighbor relationship are already required in the router and tuning the timers just means that it will be performed a bit more frequently. But GRE keepalive is not something that is already being done. So using this feature will require CPU processing to generate the packets and bandwidth usage to send the packets that would not be required otherwise.
HTH
Rick
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide