10-14-2011 01:43 PM - edited 03-07-2019 02:49 AM
Hello,
We ran into an interesting situation the other day we'd like to correct. We have our sites connected to our WAN with serial connections and use BGP to do our routing. The routers are the default gateways for the local sites and we have a WAN optimization device plugged between the router and the switch. We had to boot the optimization device at one of the sites and the routes dropped from BGP but the interface was down for less that a second. The ethernet interface is where all the connected routes we share are so I can understand that if it was down for a while the subnets would disappear. I did some looking at it looks like if I disable "bgp fast-external-fallover" on the interface, it should keep the routes in the table until the timers expire. Is this correct? Is there another/better option I should be looking at?
Thanks!
11-11-2011 05:49 AM
I had a chance to test this morning. I did a "bgp fast-external-failover disable" on the main ethernet interface and it still took things down for much longer than I think it should have. I removed the config from the interface and did it globally in bgp. I'll be able to test again in a couple weeks. Does anyone else have any other ideas?
01-23-2012 07:06 AM
I'm looking at this again and still don't have a good idea on how to prevent the routes from dropping. The interface with the conntected routes will drop for about 2 seconds twice. (After about 3-5 minutes.) I don't want those routes to leave the routing table during that time.
01-23-2012 07:20 PM
Hi,
Have you tried BGP Dampening? Give it a try, looks to be the most suitable solution for your issue.
Thanks
Vivek
01-24-2012 05:22 AM
Thanks for the response. I'll look into that. I changed the carrier-delay on the interface to 1 second hoping that would help. Basically the device on the ethernet side can do a fail to wire and that was casuing the interface to go down when really it was too fast to matter. (Or should have been.)
01-24-2012 05:33 AM
Oh yes, carrier-delay is good as well. But one thing to ensure here would be, if your outage is less in time than that of the time taken by routing convergence, then you may consider to set your carrier-delay little higher. Do let me know, how things go with it.
Thanks
Vivek
01-24-2012 05:36 AM
I'll have to see if there's any way I can test it. I don't have a device on-site to test with so we usually end up "testing" during an upgrade or failure.
I'll post back here and share the results. (It could be a little while.)
01-24-2012 06:52 AM
Ecornwell,
Please correct me if i am wrong.
You've said that once you did boot the WAN optimization device, the BGP routes were withdrawn from the IP routing table. Your goal is to keep those routes in the table when you boot that device, right? .. My questions are:
1- How often you boot the WAN optimization device?
2- After the device is booted, can you see the BGP routes back or are they permanently withdrawn?
3- Have you tried to completely remove the WAN optimization device and go to the router directly?
BGP dampening won't benefit your situation as this mechanism used to prevent the router from advertising unstable routes to the internet. In my opinion, it is a temporary solution until you fix the source of unstability. Check your Layer 1 including wiring, physical interface,...etc
HTH
AM
01-24-2012 07:28 AM
Hi AM,
You are correct.
1 - Very rarely. We ran into a known bug where the device would esstentially lock and we would have to power cycle it to get it back. This would happen every couple of months. We've also noticed the same effect when we perform upgrades but those are done after hours so they have much less impact.
2 - The routes come back after a couple minutes. (I haven't timed it because it's happened infrequently.) But, when the device powers back on, it switches back and drops the link again and it seems to happen just after the routes come back.
3 - No but because of how infreqently it happenes it isn't an issue.
The biggest problem is we have a centrailized call manager deployment and what should be a quick bump that is almost un-noticed causes the phones to fail and go into srst mode, then come back, then back into srst mode, then come back. Our phones were down for about 7 minutes one day and I felt that shouldn't happen. The device does a very fast fail to wire but the router was seeing it enough to drop the routes.
Side note, I hadn't really looked at it from the switch side. (Trunked port) The logs show the port was down for 4 seconds. I'm wondering if I need something like the carrier-delay on both sides.
01-24-2012 10:32 AM
Try to configure "Carrier-delay" in interface config. mode with a value higher than 1 second and configure EOT on the faulty interface with "Carrier-delay" in tracking configuration mode and let me know the status.
HTH
AM
01-24-2012 11:41 AM
Thanks. I set it to "5" on the ethernet interface.
I just looked at the EOT stuff and I'm not sure how it would apply in this case. (Note: I've never seen it before.)
01-24-2012 12:10 PM
This might help
http://www.cisco.com/en/US/docs/ios/12_4t/12_4t11/ht_eotcd.html
EOT will enable you to track an object such as a faulty interface. If the interface or the link fails, EOT will detect it and calls a carrier delay timer that you configured under the interface config. mode.
You have to measure the amount of time when a faulty link fails and come back again. Based on this information, set the carrier delay timer higher than your measured time. The goal is if the link fails and comes back online before the carrier timer expires, the failed state became hidden. By this way, the routes are not withdrawn from the routing table.
HTH
AM
01-24-2012 01:32 PM
Thanks for the link. Reading it, it seems like that allows EOT to use carrier delay. Since I'm not currently using EOT, it doesn't seem like I need it. I'm going to give carrier delay a shot and see if that works.
..............................
So I releaized we have a site thats not up totally yet so I did some testing. (I was able to force a physical bypass with the equipment.) With a carrier delay of 5, I lost 1 ping and the routes never left our core. With no carrier delay, we'd lose 8-10 pings and the routes would not be present at the core. The only thing that's mildly frustrating is the fact that the phones still registered in SRST mode. The interface still showed that it went down. It seemed to be delayed by the carrier delay. (I adjusted the time to 10 seconds and it was down 10 seconds, 5 was 5 seconds.) I'd really love to tell the router, "If the interface is down for less than 2 seconds, pretend it didn't go down."
01-24-2012 04:58 PM
It seems something misconfigured here.
I will test the EOT support for carrier delay tomorrow and will let you know the status. For the SRST and IP phones part, i really can't help you and hope someone can shed some light on it.
Also, i will be glad if you send me your network layout for further investigation.
HTH
AM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide