Re: Ospf Convergence

Ganesh Devarshetty · ‎08-10-2023

Which method is suitable for ospf fast convergence.

1.Interface shutdown.

2.Graceful shutdown.

3.remove network command.

Kindly help.

M02@rt37 · ‎08-10-2023

Hello @Ganesh Devarshetty,

From my point of view:

Graceful shutdown is a process where a router informs its neighboring routers about its intention to shut down a specific interface. While graceful shutdown is a good practice for minimizing disruption, it is not directly aimed at achieving fast OSPF convergence.

As concerned removing the network command from OSPF configuration will cause OSPF to stop advertising routes from the specified network, but it doesn't directly influence the speed of OSPF convergence.

With Shuting down an interface, OSPF can quickly detect the failure and initiate the SPF algorithm to recalculate the routing table. This allows OSPF to converge faster because it immediately recognizes the link as failed and triggers the recalculation process.

--The "Interface shutdown" method is indeed the most suitable approach for achieving OSPF fast convergence. It ensures that OSPF adjacencies are quickly removed and triggers rapid recalculation of the routing table, leading to faster convergence in the network.

Best regards
.ı|ı.ı|ı. If This Helps, Please Rate .ı|ı.ı|ı.

Joseph W. Doherty · ‎08-10-2023

OSPF convergence speed, and what exactly you're trying to accomplish, in an interesting, and somewhat "deep" subject, especially dealing with Cisco OSPF L3 devices, depending on their OSPF (proprietary) features.

Those 3 options all work a bit differently, in how, and how long, it takes OSPF to recognize there's a change in topology, and not just between them.

For example, #1, shutting an interface, is often a "fast" trigger to "know" there's a topology change, at least on the device with said interface, but what about the OSPF neighbor? That device's topology change detection time is often just as important as the local device. Consider R1 <> R2 vs. R1 <> S1 <> R2; shutdown interface on R1, any potential difference in R2 detecting the R1 interface went down between those two topologies?

Possibly, for #3, the local device's OSPF might start re-convergence even faster than the down local interface, because the latter often has a "bounce" delay. But, to an OSPF neighbor, it's probably much like R1 <> S1 <> R2.

For #2, much depends on what exactly it does. As it's an explicit action, it might start convergence as fast as #3, on the local device, but signal a neighbor about as fast, or faster, than #1.

So, without much more detail about topology, and OSPF features being used, you cannot really rate which would be "faster" to trigger OSPF convergence without specific details.

Again, there's much to OSPF convergence, especially using Cisco's OSPF implementations.

On the subject of convergence, there's also "good" convergence and "bad" convergence, and sometimes, when doing maintenance, you can do "good" vs. "bad", in the same situation.

For example, given: (WAN) <GE[fiber]> R1 <GE> R2 <GE> R3 <FE[fiber]> (WAN)

Let's suppose R2 is the LAN core, and all traffic to/from WAN is using R1 as the "better" OSPF path.

However, we're seeing errors on R1's gig fiber interface, so we want to pull the optic, clean it, and reinsert it.

What might you do to OSPF, if anything, for doing the forgoing? (Hint, consider convergence.)

Ganesh Devarshetty · ‎08-10-2023

We are planning for DC shutdown activity.
out of these 3 option mentioned above which is the best option that can be performed on dc router before shutdown in which ospf convergence faster without disturbing the end user (A,B,C) workload.

I have attached the topology for the same.

Ganesh Devarshetty · ‎08-10-2023

MHM Cisco World · ‎08-11-2023

I send you message check it

Joseph W. Doherty · ‎08-11-2023

"best option"

Would be none of the above (i.e. the 3 choices).

However, if you were to limit yourself to just those, likely "graceful shutdown" would be the best of those 3 options, especially as I don't have enough information.

If your goal truly is to "without disturbing the end user (A,B,C) workload", then you should consider "good" convergence vs. "bad" convergence. (BTW, "good" and "bad" are my terms, basically for zero ["good"] interruption [to traffic] convergence vs. [possibly] some ["bad"] interruption [to traffic] convergence.)

With "good" convergence, "speed" of convergence, generally doesn't matter.

Understand, "good" convergence requires configuration changes both before and while "routers" are down. But, again, for no-impact to your network's traffic, it does that.

If you want a further description of the concept, let me know. Otherwise, also again, "graceful shutdown" is likely your best choice.

Giuseppe Larosa · ‎08-11-2023

Hello @Ganesh Devarshetty ,

have a look at the following link explaining OSPF shutdown and comparing interface graceful shutdown with interface shutdown

https://itskillbuilding.com/networking/network/ospf/ospf-graceful-shutdown/#:~:text=OSPF%20graceful%20shutdown%20is%20a%20technique%20for%20taking,to%20which%20each%20gracefully%20shut-down%20interface%20is%20connected.

It looks like OSPF graceful shutdown at process level may be the right tool for you.

Also OSPF stub feature making all Router LSA links to use max metric 65535 can be used to make a router speak OSPF but to be not used for transit traffic.

However, how the customer sites communicate to DC and DR Sites ? they use OSPF also or they use eBGP sessions ?

In any case, you should ask for a maintenance time window for safety.

Hope to help

Giuseppe

Joseph W. Doherty · ‎08-11-2023

@Giuseppe Larosa wrote:

However, how the customer sites communicate to DC and DR Sites ? they use OSPF also or they use eBGP sessions ?

Good question.

A similar question, will access to the whole site be cut, i.e. will clients need to "transparently" switch to the DR site's resources? If so, might that interrupt client work flows?

BTW, as I noted early on, there's much to achieving the fastest OSPF convergence, but except for default OSPF hello timers (which might take up to 40 seconds to "see" the lost of an adjacent OSPF neighbor), Cisco default convergence related "things" aren't too slow. As even in @Giuseppe Larosa reference, the ping loss was just a few seconds. (However, when supporting VoIP, and you don't want a VoIP call to drop, or even "blip" a syllable of conversation, then we work to achieve sub-second convergence. [Incidentally, this can be difficult to achieve.])

BTW, any of your OP methods, are all likely "bad" convergence techniques.

Although Giuseppe's reference has:

"The shutdown command deactivates OSPF in the least disruptive way while notifying neighbors of the current change in order to reroute IP prefixes served by the current router to alternate next hops."

I don't believe it's actually the least disruptive way, but it might be good enough.

Remember, although the reference showed no ping drops, they are also being done 2 seconds apart and the R1<>R4 interface was still physically up. I.e. while OSPF is recalculating the new path (which is often sub-second - let alone this example is a trivial OSPF topology), packets might still be going across the link before the route table was updated and/or packets might have looped between routers, but not enough to be dropped, until the whole topology re-converged. (Also note: the ping run with the drops don't show the completion stats, and the ping run with the drops has: "round-trip min/avg/max = 20/32/56 ms" - interesting spread of RTT times, some, though, likely due to slower and longer path after rerouted.)

Now I keep mentioning "good" convergence. Basically, it's very simple, logically, to accomplish, but it can take some "work". Move traffic because there's a better path, not because the current path is "broken" or "worse".

Going back to my: (WAN) <GE[fiber]> R1 <GE> R2 <GE> R3 <FE[fiber]> (WAN)

R1 is the active/preferred way to the WAN. We want to switch it over to using R3.

By, somehow, telling the network to stop using R1, R3 should take over.

If we do "something" on R1, it, more-or-less, knows we want it to no longer use the WAN connection. But for the network to converge, each each router has to re-compute the OSPF topology and most do this by getting the "bad" news, i.e. don't use R1.

If we shut the R1's WAN interface, any traffic using R1 is immediately blocked, while R1 is re-computing how to get out to the WAN (i.e. via R2 and R3).

R1 will notify R2 that its WAN interface link is gone, and R2 will re-compute how to get to WAN (via R3) and also notify R3 of the topology change.

R2, though may lag R1. I.e. it may still forward traffic to R1, before it's notified by R1 and/or while it re-computes its path to the WAN. Further, if R1 tries to send WAN packets to R2, R2 may just loop them back to R1 (i.e. a transient routing loop). Same issues occurs with R3, as it too will likely lag. This is the "classic" bad news problem of dynamic routing. Until all routers have re-converged, they will misdirect packets.

The alternative traffic shifting approach is to make the alternate path, the "better" path. If we set R3's WAN interface, to be better than R1's WAN interface, eventually, other router's will get the "good" news, revise their routing table to the "better" path. As the prior path still works, traffic going that way is still delivered. No traffic should be looped or lost during the traffic shift, so "fast" convergence isn't an issue.

Basically, this is the converse of "shutdown". Rather than making all interfaces leaving the "shutdown" device, the highest metric, we setup an alternative "better" path.

Again, this takes some work, as you have to insure your higher costed alternative path, is now seen as "better". One issue, you often bump into, is whether your OSPF cost settings allow you to decrease the metric, enough, on the alternate path to make it "better". (If not, then you need to adjust, first your alternate path costs higher, then your primary paths higher, so you can then set your alternative paths costs as lower. [Did I mention this can be a lot of work?])

The other issue, when we want to bring the former best path back on-line, before we do, we want to decrease the alternate path back to its prior values.

The other issue is, don't forget you need to also adjust the other side's router(s) to prefer the alternative as a "better" path. (This can be lots of work.)

Because of all the work this can entail, for most networking situations, something like the graceful shutdown approach is often "good enough".

But, I think it worthwhile knowing, that distributing path changes that are "better" is "better".

BTW, without doing the above to migrate traffic, when you bring the "primary" back on-line, as it's "better" you don't bump into the transient re-convergence issues. Again, "better" is "better". ; )

Ganesh Devarshetty · ‎08-11-2023

Thanks for sharing the knowledge.

The customers routes are learned via ebgp & redistributed into ospf.