EIGRP backup routes, variance, holdtime expiry, and route removal

ipcruiser81 · ‎06-21-2015

Hi Folks,

Could you answer this for me please since I am lacking clarity on EIGRP operation. All sims done on GNS3.

1) I have a feasible successor route in my eigrp topology table to my successor route. However, when the interface where the successor route is learned goes down (shutdown manually), the back up route is not used until holdtime expiry (and therefore packet loss)! Is it possible to have fast failover to backup route so that there is no packet loss, without amending the holdtimes?

2) Increased the variance for an EIGRP AS. Now the routing table have dual routes (through different paths) to the same destination. Neighbor interface shutdown during continuous pings to destination and once again packets continued to be dropped until holdtime expiry.

Is it not possible to have failover to backup route in the topology table or a load balanced route in the routing table without packet loss?

How soon is a route removed from the routing table when an interface goes down and how soon does EIGRP realise loss of this route? It would appear EIGRP entirely depends on its holdtime expiry before determining or doing anything about the destination.

Regards

Peter Paluch · ‎06-22-2015

Hi,

1) I have a feasible successor route in my eigrp topology table to my successor route. However, when the interface where the successor route is learned goes down (shutdown manually), the back up route is not used until holdtime expiry (and therefore packet loss)! Is it possible to have fast failover to backup route so that there is no packet loss, without amending the holdtimes?

I am very surprised at this behavior, as this is most certainly not the default EIGRP behavior. Can you perhaps post the configuration of your EIGRP process and of the interface you are shutting down? Or, ideally, can you post the entire configuration of your device after removing sensitive information? Definitely, this issue requires very close investigation as it contradicts the normal EIGRP behavior.

Also, the show ip route X.X.X.X and show ip eigrp topology X.X.X.X for the network and netmask in question would be highly helpful.

In general, when an interface to an EIGRP neighbor is shutdown, EIGRP is supposed to react immediately. There should be no waiting for the Hold down timer to expire. Waiting for a neighbor to expire would be in order if it was the neighbor's interface that was shutdown, not the current router's.

Looking forward to reading your response.

Best regards,
Peter

ipcruiser81 · ‎06-24-2015

Hi Peter!

Apologies for the delay and thanks for the response.

Let's say we have a point to point link between two routers on GNS3. If one end of the link is shut down, the other end will remain up, interface and line protocol! This is what threw off the test. I don't have cisco hardware to test failover to backup route from the topology table. I am guessing that it is instantaneous with no packet loss.

You've said if it is the neigbhors interface that is shut down, then I should expect hold time expiry? This would be the case where the neighbors are not directly connected and perhaps may have a L2 device in between, correct?

Finally, what triggers a route to be removed from a routing table? If you can point me to Cisco docs, that would help too. Thanks again!

Rgds

Peter Paluch · ‎06-24-2015

Hi,

Oh, you're working in GNS3. Now I understand what's happening.

f one end of the link is shut down, the other end will remain up, interface and line protocol! This is what threw off the test.

Yes, that's right. The router on which the interface was shut down should react instantaneously because it knows about its deactivated interface right away. However, the neighboring device may not notice the link going down. In GNS3, there is no emulation of the physical layer so even if one device on a link goes down, the other device won't notice this as a linkdown event. As a result, the other device will wait for some kind of timeout to react - in EIGRP terminology, this is the Hold time, usually set to 15 seconds.

As you have correctly mentioned, this would also happen with real routers if the connection between them went through intermediary Layer1 or Layer2 devices - media converters, modems, repeaters, hubs, bridges, switches, access points.

I don't have cisco hardware to test failover to backup route from the topology table. I am guessing that it is instantaneous with no packet loss.

It would be close to instantaneous but some short period of time and corresponding packet loss would still occur. One obstacle in achieving near-immediate switchover to a backup path is the speed with which the interface can reliably detect and report a link failure. With common Serial or Ethernet interfaces, this time can be up to hundreds of milliseconds. In addition, Cisco IOS has a feature that intentionally delays the processing of information about an interface going down. It is called the carrier-delay, and it is used to mask transient, very short-lived flaps in the interface connectivity so that they don't impact the routing. By default, this delay is set to 2 seconds, so only 2 seconds after an interface went down, IOS will be informed about it and will start reacting. For fast convergence, this timer is often reduced to a sensible minimum. And as if it wasn't enough, some Cisco router platforms (such as 1841) actually perform a periodic polling of the status of their interfaces instead of being informed about the interface state change via an interrupt. This periodic polling is in terms of seconds (1 - 2 seconds). This means that even if you decrease the carrier-delay to 0, the linkdown event will be discovered by the router only after it polls the interface again in its nearest polling turn.

It is actually quite hard to configure a router to react immediately to a link state change. Usually, with good tuning, you can bring down the reaction time to tens or hundreds of milliseconds.

You've said if it is the neigbhors interface that is shut down, then I should expect hold time expiry?

Yes - suppose you have R1 and R2 connected together. R2 shuts down its interface toward R1. Depending on how they are connected together, R1 may not notice that R2's interface went down, so it basically doesn't have a clue that something happened until the Hold down time expires. Situations in which R1 doesn't notice the interface failure on R2 are generally caused by intermediary Layer1/2 devices on the link between R1 and R2 as mentioned earlier.

In reality, this may be a problem because you obviously want the routers to know about the link failure as soon as possible. You could decrease the Hello and Hold timers but usually, the smallest allowed period for the Hold time is 1 second, and more gravely, this causes the router's CPU go significantly high if you have many EIGRP neighbors with whom you need to exchange Hellos one or more times per second. Therefore, instead of tuning the routing protocol timers, another protocol for fast liveliness detection is used - the Bidirectional Forwarding Protocol, or BFD. BFD is basically a fast hello/keepalive mechanism running between neighboring routers, and if BFD declares a neighbor failure (much sooner than the Hold time in EIGRP), EIGRP will react as if the Hold time expired. This protocol is lightweight enough to be even implemented in hardware of some linecards so that the router's CPU is not bothered even with very frequent BFD Hellos (one each, say, 50ms). On lower-end routers, BFD runs in software only so it's still possible to overload the CPU if not used judiciously.

Finally, what triggers a route to be removed from a routing table? If you can point me to Cisco docs, that would help too.

I am not sure if I ever came across such document. However, let's try to simply deduce the answer.

A route in the routing table can be removed because of these reasons:

The routing source (routing protocol, static configuration) that has added the route into the routing table decides to remove it again. Most often, this is caused by the routing source receiving information that the route is no longer reachable and no replacement path exists. For EIGRP, this would mean receiving the route with an infinite metric from the current successor and after the diffusing computation has started and finished, finding no replacement path. Also, an EIGRP/OSPF/IS-IS/BGP adjacency to a neighbor going down is a reason to remove its routes from the routing table, or more precisely, it is a reason to consider all networks learned from that neighbor as unreachable and start looking for replacement routes.
A garbage collector evicting the route from the routing table because of its exceedingly high age. This is only used in RIP to my best knowledge, and the timer that drives the garbage collection is called Flush after timer, 240 seconds by default.
The route in the routing table has become invalid because either the next hop became unresolvable (the path to the next hop itself can no longer be identified in the routing table), or the egress interface has become inoperable (its state is no longer up/up).
The presence of a route in the routing table was tied to a so-called tracking object whose state has changed to Down, and so the route is no longer given a permission to be installed to the routing table.
The route's Administrative Distance has changed to 255. This is usually caused by a deliberate configuration using the distance command in a routing protocol's configuration to manipulate the trustworthiness of selected routes from selected neighbors.

Feel welcome to ask further!

Best regards,
Peter

ipcruiser81 · ‎06-24-2015

Great! Thank you, sure to have more questions, see you around!