cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
357
Views
20
Helpful
4
Replies
Plamen Mladenov
Beginner

ASR9001 - Quarantined 0.0.0.0/0 route

Hello,

 

We have a very simple setup:

 

The setup:

 

In MPLS environment, PE1 (ASR9001) router (with IOS-XR 6.6.3) peering with upstream Internet provider - ISP1 (with a standard single hop eBGP session)  and receiving the full Internet table + default route from ISP1 into the default routing table (not in VRF).

The same PE1 router is forming MP-IBGP sessions with few dedicated BGP route reflectors (RRs) and for IPv4 AFI it's sending the default route (learnt from ISP1) with next-hop-self to the RRs (and few other customers' routes, not related to ISP1). On the opposite direction - all RRs are advertising 0.0.0.0/0 to that PE1 coming from different Internet facing PEs and different ISPs. 

There is no manipulation of BGP attributes for 0.0.0.0/0 prefix (except next-hop-self from PEs), so PE1 selects the externally learnt 0.0.0.0/0 route from its eBGP session to ISP1. All RRs are sending "Additional Paths" to all PEs. BGP PIC Edge is enabled on PEs (including PE1).

 

The problem:

 

All good so far, unless there is a ISP DIA circuit failure (there's a working BFD session between PE1 and ISP1). I simulate the failure manually disabling the physical interface between PE1 and ISP1 (on PE1 end).

After that few things happen:

  • BFD and eBGP sessions went down immediately ("CEASE notification sent - administrative shutdown") - expected.
  • Router logged a suspicious message regarding recursion loop for its eBGP peer IP address:
    • RP/0/RSP0/CPU0: ipv4_rib[1197]: %ROUTING-RIB-7-SERVER_ROUTING_DEPTH : Recursion loop looking up prefix [ISP1 IP address] in Vrf: "default" Tbl: "default" Safi: "Unicast" added by bgp
  • 0.0.0.0/0 became quarantined on PE1 (no other prefixes were in quarantined state)
  • BGP process on PE1 did NOT sent withdrawn message for 0.0.0.0/0 prefix to route reflectors (I did a “debug ip bgp updates PE1-Loopbac0-IP” on one of the RRD route reflectors and while 0/0 was in quarantined state there was no update received from PE1)
    • Other non Internet facing PEs were still using PE1 as a best path for 0/0 (although there is a valid backup path to other PE)
    • PE1 was blackholing traffic (route to 0/0 was quarantined)
  • 2 minutes and 25 seconds later - PE1 removed 0/0 prefix from quarantined state, inserted 0/0 coming from PE2 (the previous backup route) and sent BGP withdrawn message to the route reflectors which was propagated to the rest of the PEs

I haven't found a lot regarding RIB Quarantining, looks like it's some kind of protection mechanism from route oscillations. I did checked and ensured that IGP is completely stable and there were no BGP updates going to the RRs (as I wrote - only default route and few internal routes are sent to RRs, NOT the full internet BGP table). ISP1 is NOT advertising P2P /30 subnet into the BGP table, nor PE1 does, however ISP1 advertises a larger block /9 (and 0/0). I tried to disable the default RIB dampening mechanism with:

 

router rib
address-family ipv4 (hidden command)
   next-hop dampening disable

but noting changes. 0/0 was still marked as Quarantined for 2.5 minutes.

 

 

This problem has been temporarily solved by permitting only 0/0 from ISP1 and filtering everything else from ISP1 on PE1.

 

The question is - what might be the reasons for this behavior? Could it be the size of the global Internet table and the way PE1 (ASR9001) is processing it? My expectations is that once physical interface is down and eBGP session is down - it should immediately withdrawn all routes with the next-hop ISP1 (unreachable). Could it be because of that /9 route (which includes the eBGP peer address, although it's coming from the same neighbor). And it took 2:30 minutes to release the quarantined 0.0.0.0/0 route.

I tried to simulate the setup (again with larger prefix including the P2P, but it worked as expected, however simulated ISP was sending few routes only (not 800K+ as the real one).

 

Any suggestions/thought are highly appreciated.

 

Regards,

Plamen

 

 

 

 

4 REPLIES 4
MHM Cisco World
Collaborator

Can you draw topology 

The ugliest diagram I've ever made.

All internal routers are running multi-level OSPF + LDP for transport label signaling and PE routers are forming MP-IBGP sessions with all RRs for vpnv4 and ipv4 unicast AFIs. Core is BGP free.

smilstea
Cisco Employee

Quarantined routes happen for a few reasons, either the route is flapping in and out of the RIB very often or the next-hop is flapping often.

A few commands need to be gathered when this happened immediately:

show rib next-hop

show rib next-hop damped

show rib history

show route <prefix>

show route resolving-next-hop <prefix>

show bgp <AFI> <prefix>

show bgp <AFI> nexthops

show bgp <AFI> dampened-paths

 

Also a show tech rib and routing bgp for any traces.

 

Given your symptoms it sounds like the next-hop is flapping.

 

Sam

Thanks for your input here Sam, I'll include all these for the next available window. Positive thing is that this behavior is easily reproducible (not in the LAB, though).

That's my understanding as well for the flapping next-hop or route, however I don't see anything flapping. Moreover "show route 0.0.0.0/0" shows the primary eBGP path via ISP1 (going to the directly connected interface) and a backup route (next best path, recursively using IBGP -> IGP). When I manually shutdown the ISP facing interface, there are no IGP changes (there's no redistribute connected and the ISP facing interface is not included into IGP process).  I'm also worried about the log message logged immediately after the physical interface is disabled:

RP/0/RSP0/CPU0: ipv4_rib[1197]: %ROUTING-RIB-7-SERVER_ROUTING_DEPTH : Recursion loop looking up prefix [ISP1 IP address] in Vrf: "default" Tbl: "default" Safi: "Unicast" added by bgp

The only other path to [ISP1 IP address] when the directly connected network goes down is via a summary /9 eBGP route coming from the same eBGP peer, via the same administratively disabled interface, which shouldn't be in the RIB anymore (last time I haven't been able to check "show route [ISP1 IP address]" when interface is down) or via 0.0.0.0/0 (again originated by ISP1 or by different ISP and IBGP peer with the RRs) 

 

Regards,

Plamen