cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2569
Views
20
Helpful
5
Replies

ASR9001 - Quarantined 0.0.0.0/0 route

Plamen Mladenov
Level 1
Level 1

Hello,

 

We have a very simple setup:

 

The setup:

 

In MPLS environment, PE1 (ASR9001) router (with IOS-XR 6.6.3) peering with upstream Internet provider - ISP1 (with a standard single hop eBGP session)  and receiving the full Internet table + default route from ISP1 into the default routing table (not in VRF).

The same PE1 router is forming MP-IBGP sessions with few dedicated BGP route reflectors (RRs) and for IPv4 AFI it's sending the default route (learnt from ISP1) with next-hop-self to the RRs (and few other customers' routes, not related to ISP1). On the opposite direction - all RRs are advertising 0.0.0.0/0 to that PE1 coming from different Internet facing PEs and different ISPs. 

There is no manipulation of BGP attributes for 0.0.0.0/0 prefix (except next-hop-self from PEs), so PE1 selects the externally learnt 0.0.0.0/0 route from its eBGP session to ISP1. All RRs are sending "Additional Paths" to all PEs. BGP PIC Edge is enabled on PEs (including PE1).

 

The problem:

 

All good so far, unless there is a ISP DIA circuit failure (there's a working BFD session between PE1 and ISP1). I simulate the failure manually disabling the physical interface between PE1 and ISP1 (on PE1 end).

After that few things happen:

  • BFD and eBGP sessions went down immediately ("CEASE notification sent - administrative shutdown") - expected.
  • Router logged a suspicious message regarding recursion loop for its eBGP peer IP address:
    • RP/0/RSP0/CPU0: ipv4_rib[1197]: %ROUTING-RIB-7-SERVER_ROUTING_DEPTH : Recursion loop looking up prefix [ISP1 IP address] in Vrf: "default" Tbl: "default" Safi: "Unicast" added by bgp
  • 0.0.0.0/0 became quarantined on PE1 (no other prefixes were in quarantined state)
  • BGP process on PE1 did NOT sent withdrawn message for 0.0.0.0/0 prefix to route reflectors (I did a “debug ip bgp updates PE1-Loopbac0-IP” on one of the RRD route reflectors and while 0/0 was in quarantined state there was no update received from PE1)
    • Other non Internet facing PEs were still using PE1 as a best path for 0/0 (although there is a valid backup path to other PE)
    • PE1 was blackholing traffic (route to 0/0 was quarantined)
  • 2 minutes and 25 seconds later - PE1 removed 0/0 prefix from quarantined state, inserted 0/0 coming from PE2 (the previous backup route) and sent BGP withdrawn message to the route reflectors which was propagated to the rest of the PEs

I haven't found a lot regarding RIB Quarantining, looks like it's some kind of protection mechanism from route oscillations. I did checked and ensured that IGP is completely stable and there were no BGP updates going to the RRs (as I wrote - only default route and few internal routes are sent to RRs, NOT the full internet BGP table). ISP1 is NOT advertising P2P /30 subnet into the BGP table, nor PE1 does, however ISP1 advertises a larger block /9 (and 0/0). I tried to disable the default RIB dampening mechanism with:

 

router rib
address-family ipv4 (hidden command)
   next-hop dampening disable

but noting changes. 0/0 was still marked as Quarantined for 2.5 minutes.

 

 

This problem has been temporarily solved by permitting only 0/0 from ISP1 and filtering everything else from ISP1 on PE1.

 

The question is - what might be the reasons for this behavior? Could it be the size of the global Internet table and the way PE1 (ASR9001) is processing it? My expectations is that once physical interface is down and eBGP session is down - it should immediately withdrawn all routes with the next-hop ISP1 (unreachable). Could it be because of that /9 route (which includes the eBGP peer address, although it's coming from the same neighbor). And it took 2:30 minutes to release the quarantined 0.0.0.0/0 route.

I tried to simulate the setup (again with larger prefix including the P2P, but it worked as expected, however simulated ISP was sending few routes only (not 800K+ as the real one).

 

Any suggestions/thought are highly appreciated.

 

Regards,

Plamen

 

 

 

 

5 Replies 5

Can you draw topology 

The ugliest diagram I've ever made.

All internal routers are running multi-level OSPF + LDP for transport label signaling and PE routers are forming MP-IBGP sessions with all RRs for vpnv4 and ipv4 unicast AFIs. Core is BGP free.

smilstea
Cisco Employee
Cisco Employee

Quarantined routes happen for a few reasons, either the route is flapping in and out of the RIB very often or the next-hop is flapping often.

A few commands need to be gathered when this happened immediately:

show rib next-hop

show rib next-hop damped

show rib history

show route <prefix>

show route resolving-next-hop <prefix>

show bgp <AFI> <prefix>

show bgp <AFI> nexthops

show bgp <AFI> dampened-paths

 

Also a show tech rib and routing bgp for any traces.

 

Given your symptoms it sounds like the next-hop is flapping.

 

Sam

Thanks for your input here Sam, I'll include all these for the next available window. Positive thing is that this behavior is easily reproducible (not in the LAB, though).

That's my understanding as well for the flapping next-hop or route, however I don't see anything flapping. Moreover "show route 0.0.0.0/0" shows the primary eBGP path via ISP1 (going to the directly connected interface) and a backup route (next best path, recursively using IBGP -> IGP). When I manually shutdown the ISP facing interface, there are no IGP changes (there's no redistribute connected and the ISP facing interface is not included into IGP process).  I'm also worried about the log message logged immediately after the physical interface is disabled:

RP/0/RSP0/CPU0: ipv4_rib[1197]: %ROUTING-RIB-7-SERVER_ROUTING_DEPTH : Recursion loop looking up prefix [ISP1 IP address] in Vrf: "default" Tbl: "default" Safi: "Unicast" added by bgp

The only other path to [ISP1 IP address] when the directly connected network goes down is via a summary /9 eBGP route coming from the same eBGP peer, via the same administratively disabled interface, which shouldn't be in the RIB anymore (last time I haven't been able to check "show route [ISP1 IP address]" when interface is down) or via 0.0.0.0/0 (again originated by ISP1 or by different ISP and IBGP peer with the RRs) 

 

Regards,

Plamen

Hello Sam,

 

We have exactly the same behavior on PE2 (ASR9001 platform again, but with much older code - IOS-XR 5.3.2). So there's something terribly wrong here, either with a configuration or a bug related to a specific setting (I'll try to simplify the config as much as possible during next maintenance window, disabling BFD and some extra BGP config).

It's really annoying problem, dropping traffic for almost 3 minutes, although there's already a backup path pre-calculated for 0/0 and inserted into the FIB table.

Again if I only allow 0/0 and block everything else coming from the directly connected ISP, there is NO issue.

 

Some of the requested outputs taken from PE1 (ASR9001 with IOS-XR 6.6.3) are below. I've obscured some of the sensitive information (IPs, ASN, etc.).

ISP1 circuit is physically & logically terminated on PE1 interface Te0/0/2/3. ISP1 is using 203.0.113.1/30, PE1 is using 203.0.113.2/30 on Te0/0/2/3.

 

All of the outputs (except the last one) are during the issue (~2.5 minutes)

 

show rib next-hop

RP/0/RSP0/CPU0:PE1#show rib next-hop
Sun Jul 25 22:11:14.199 CDT

Registered nexthop notifications:
A - Active route, B - First backup route.

 (A) 0.0.0.0/0 via 203.0.113.1 - TenGigE0/0/2/3, ospf/node0_RSP0_CPU0
 (A) 203.0.113.1/32 via 0.0.0.0 - None, bgp/node0_RSP0_CPU0

 

It is showing ospf here, which is strange for me, considering the fact 0.0.0.0/0 is not into OSPF database.

There is "redistribute connected" under the OSPF process with a route-policy matching only downlink interfaces and loopback0 (so ISP facing interface is not included into the OSPF process, not redistributed,not into ospf database either). Of course there is no OSPF to BGP or BGP to OSPF redistribution.

 

RP/0/RSP0/CPU0:PE1#show rib next-hop 0.0.0.0/0

RP/0/RSP0/CPU0:PE1#show rib next-hop 0.0.0.0/0
Sun Jul 25 22:11:17.257 CDT

Firsthop prefix: 0.0.0.0/0
  Flags: exact match, allow default, recurse
  Last event occurred Jul 18 01:07:59.525, 1w0d ago; version 155

  Registered clients:
    ospf/node0_RSP0_CPU0 created Apr  9 14:43:26.385, 1y15w ago
       read last notification at Jul 18 01:07:59.528, 1w0d ago
       reference count is 1

  Destination paths:
    203.0.113.1 - TenGigE0/0/2/3

  Resolving route: 0.0.0.0/0 known via "bgp OUR-PUBLIC-ASN#"
  Metric computed: 0
RP/0/RSP0/CPU0:PE1#show rib next-hop 203.0.113.1/32
Sun Jul 25 22:11:21.019 CDT

Firsthop prefix: 203.0.113.1/32
  Flags: recurse
  Last event occurred Jul 25 22:10:39.844, 00:00:41 ago; version 26

  Registered clients:
    bgp/node0_RSP0_CPU0 created Jun  4 15:22:06.021, 1y07w ago
       read last notification at Jul 25 22:10:39.847, 00:00:41 ago
       reference count is 1

  Firsthop is unresolved
RP/0/RSP0/CPU0:PE1#show rib next-hop damped
Sun Jul 25 22:11:24.587 CDT

Damped nexthop notifications:
A - Active route, B - First backup route.

No damped routes

RP/0/RSP0/CPU0:PE1#show rib history
Sun Jul 25 22:11:27.669 CDT

JID   Client (CID)       Location
0     bcdl_ug (1)              node0_RSP0_CPU0


JID   Client (CID)       Location
1029  ospf (15)                node0_RSP0_CPU0
  Table ID: 0xe0000000
     C 203.0.113.0/30                   deleted,            3  00:00:48
     L 203.0.113.2/32                   deleted,            3  00:00:48
     C 203.0.113.0/30[0/0]               update,  1 paths, 0x1082  4 1w0d
     L 203.0.113.2/32[0/0]               update,  1 paths, 0x1081  3 1w0d


JID   Client (CID)       Location
1083  bgp (17)                 node0_RSP0_CPU0


JID   Client (CID)       Location
0     bcdl_ug (18)             node0_RSP0_CPU0
  Table ID: 0xe0000000
     B 0.0.0.0/0 [20/0]                  update,  1 paths, 0x0004 10 00:00:48
     L 203.0.113.2/32                   deleted,            3  00:00:48
     C 203.0.113.0/30                   deleted,            3  00:00:48
     B 201.220.154.0/24                 deleted,           12  00:01:00
     B 200.198.192.0/18                 deleted,           12  00:01:08
     B 138.207.67.0/24                  deleted,           12  00:01:09
     B 138.207.66.0/24                  deleted,           12  00:01:09
     B 197.186.0.0/15                   deleted,           12  00:01:10
     B 222.54.224.0/19[20/24811]         update,  1 paths, 0x0200 12 00:01:12
     B 220.158.174.0/23[20/20120]        update,  1 paths, 0x0200 12 00:01:12
     B 220.158.172.0/23[20/20120]        update,  1 paths, 0x0200 12 00:01:12
     B 216.171.184.0/21[20/22573]        update,  1 paths, 0x0200 12 00:01:12


JID   Client (CID)       Location
0     bcdl_ug (19)             node0_RSP0_CPU0
  Table ID: 0xe0000029
     B 0.0.0.0/0 [200/0]                 update,  1 paths, 0x0004 12 00:00:50
  Table ID: 0xe0000027
     B 0.0.0.0/0 [200/0]                 update,  1 paths, 0x0004 12 00:00:50
  Table ID: 0xe0000012


JID   Client (CID)       Location
1224  mpls_ldp (20)            node0_RSP0_CPU0
  Table ID: 0xe0000000
     C 203.0.113.0/30                   deleted,            3  00:00:52
     L 203.0.113.2/32                   deleted,            3  00:00:52

show route 0.0.0.0/0

RP/0/RSP0/CPU0:PE1#show route 0.0.0.0/0
Sun Jul 25 22:11:35.548 CDT

Routing entry for 0.0.0.0/0
  Known via "bgp OUR-PUBLIC-ASN#", distance 20, metric 0, candidate default path
  Tag ISP-PUBLIC-ASN#
  Number of pic paths 1 , type internal and external
  Installed Jul 23 00:51:15.216 for 2d21h
  Routing Descriptor Blocks
    PE2-Loopback-IP, from RR1, BGP backup path
      Route metric is 0
    203.0.113.1, from 203.0.113.1 (quarantined), BGP external
      Route metric is 0
  No advertising protos.

203.0.113.1 shouldn't be here anymore, since the bgp sesson is in idle state, because physical interface is admin down

RP/0/RSP0/CPU0:PE1#show route 203.0.113.1
Sun Jul 25 22:11:38.938 CDT

Routing entry for 0.0.0.0/0
  Known via "bgp OUR-PUBLIC-ASN#", distance 20, metric 0, candidate default path
  Tag ISP-PUBLIC-ASN#
  Number of pic paths 1 , type internal and external
  Installed Jul 23 00:51:15.216 for 2d21h
  Routing Descriptor Blocks
    PE2-Loopback-IP, from RR1, BGP backup path
      Route metric is 0
    203.0.113.1, from 203.0.113.1 (quarantined), BGP external
      Route metric is 0
  No advertising protos.

 

RP/0/RSP0/CPU0:PE1#show route resolving-next-hop 0.0.0.0
Sun Jul 25 22:11:42.207 CDT

% Network not in table

show route resolving-next-hop 203.0.113.1

RP/0/RSP0/CPU0:PE1#show route resolving-next-hop 203.0.113.1
Sun Jul 25 22:11:45.485 CDT

% Network not in table

show bgp 0.0.0.0/0

RP/0/RSP0/CPU0:PE1#show bgp 0.0.0.0/0
Sun Jul 25 22:11:48.479 CDT
BGP routing table entry for 0.0.0.0/0
Versions:
  Process           bRIB/RIB  SendTblVer
  Speaker          322484399   322484399
Last Modified: Jul 25 22:11:04.759 for 00:00:43
Paths: (27 available, best #3)
  Advertised IPv4 Unicast paths to update-groups (with more than one peer):
    0.3 0.4
  Advertised IPv4 Unicast paths to peers (in unique update groups):
    Customer
  Path #1: Received by speaker 0
  Not advertised to any peer
  ISP21_ASN#
    PE21 (metric 2001) from RR1 (PE21)
      Origin IGP, metric 0, localpref 100, valid, internal, backup, add-path, import-candidate
      Received Path ID 0, Local Path ID 34, version 322484399
      Originator: PE21, Cluster list: 1
....
  Path #3: Received by speaker 0
  Advertised IPv4 Unicast paths to update-groups (with more than one peer):
    0.3 0.4
  Advertised IPv4 Unicast paths to peers (in unique update groups):
    Customer
  ISP2_ASN#
    PE2 (metric 1000) from RR1 (PE2)
      Origin IGP, localpref 100, valid, internal, best, group-best, import-candidate
      Received Path ID 3, Local Path ID 1, version 322484399
      Originator: PE2, Cluster list: 1

There are 27 paths for 0/0, the proper one is selected as a best and other one is a backup route. Keep in mind that all RRs are reflecting only the default route originated by ISPs and few local routes (not the full Internet BGP table, which is local to the PEs)

 

RP/0/RSP0/CPU0:PE1#show bgp dampened-paths
Sun Jul 25 22:11:54.826 CDT

show route 0.0.0.0/0

RP/0/RSP0/CPU0:PE1#show route 0.0.0.0/0
Sun Jul 25 22:11:59.729 CDT

Routing entry for 0.0.0.0/0
  Known via "bgp OUR-PUBLIC-ASN#", distance 20, metric 0, candidate default path
  Tag ISP-PUBLIC-ASN#
  Number of pic paths 1 , type internal and external
  Installed Jul 23 00:51:15.217 for 2d21h
  Routing Descriptor Blocks
    PE2, from RR1, BGP backup path
      Route metric is 0
    203.0.113.1, from 203.0.113.1 (quarantined), BGP external
      Route metric is 0
  No advertising protos.

PE2 should become primary, 203.0.113.1 shouldn't be there anymore, however it's shown as "quarantined"

 

RP/0/RSP0/CPU0:PE1#show cef 0.0.0.0/0
Sun Jul 25 22:12:05.498 CDT
0.0.0.0/0, version 573914642, proxy default, internal 0x1000011 0x0 (ptr 0x9dfb7068) [1], 0x0 (0x0), 0x0 (0x0)
 Updated Jul 25 22:10:39.852
 Prefix Len 0, traffic index 0, precedence n/a, priority 4
   via PE2-Loopback/32, 8 dependencies, recursive [flags 0x6000]
    path-idx 0 NHID 0x0 [0xa57fa3a0 0x0]
    next hop PE2-Loopback/32 via PE2-Loopback/32

Although CEF is showing the correct entry, router is still blocking traffic and RIB is incorrect:

RP/0/RSP0/CPU0:PE1#show route 203.0.113.1
Sun Jul 25 22:12:18.053 CDT

Routing entry for 0.0.0.0/0
  Known via "bgp OUR-PUBLIC-ASN#", distance 20, metric 0, candidate default path
  Tag ISP1-PUBLIC-ASN#
  Number of pic paths 1 , type internal and external
  Installed Jul 23 00:51:15.217 for 2d21h
  Routing Descriptor Blocks
    PE2, from RR1, BGP backup path
      Route metric is 0
    203.0.113.1, from 203.0.113.1 (quarantined), BGP external
      Route metric is 0
  No advertising protos.
RP/0/RSP0/CPU0:PE1#

 

 

Fixed after few minutes:

RP/0/RSP0/CPU0:PE1#show route 0.0.0.0/0
Sun Jul 25 22:13:02.102 CDT

Routing entry for 0.0.0.0/0
  Known via "bgp OUR-PUBLIC-ASN#", distance 200, metric 0, candidate default path
  Tag ISP2-BGP-ASN#
  Number of pic paths 1 , type internal
  Installed Jul 25 22:12:41.442 for 00:00:20
  Routing Descriptor Blocks
    PE21, from RR1, BGP backup path
      Route metric is 0
    PE2, from RR1
      Route metric is 0
  No advertising protos.
RP/0/RSP0/CPU0:PE1#

 

Regards,

Plamen

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: