Solved: Null0 static routes for BGP aggregates cause redistibution into OSPF

robert.gillen · ‎08-18-2021

Hey all,

Had an issue for us that caused a large outage for us, we logged a TAC case but didn't really an answer as to why, essentially that yes it does happen, as we told them we were able to replicate the issue in lab after investigation. So hoping someone has some insight, mainly so we can understand what is happening. We have resolved.

To try keep it simple here is the summary of the issue:

Designs and simulations were all modeled on NXOS software
Temporary hardware (using IOS-XE) was installed due to chip shortages and delays in manufacturing
Advertising BGP aggregate subnets to new SDWAN VPN concentrators
BGP aggregates were advertised using "network command" under bgp
Network command routes were defined by Null0 static routes
- ip route vrf BLUE 169.254.244.0 255.255.255.0 Null0 252
Routes received from BGP peers are redistributed into the local core network OSPF process
Issue: Null0 routes with network advertisements got automatically redistributed into OSPF - causing issues
Issue: doesn't occur on NXOS and investigation revealed issue could be replicated on IOS-XE devices (routers and switches)
Issue: doesn't occur with any other static routes and the network command (only those with destination null0)
its not the redistribute connected statement

Fix:

tag static null0 routes
- ip route vrf BLUE 169.254.244.0 255.255.255.0 Null0 252 tag 1234
deny tag 1234 in the route-map for BGP to OSPF redistribution

I'm stumped to know why and I only have 2 assumptions, 1 - its a bug, 2 - is it because Null0 routes are automatically created when using "aggregate-address x.x.x.x" in BGP?

Configs and example debug below.
BGP:
router bgp 65003
bgp router-id 22.22.22.22
bgp log-neighbor-changes
timers bgp 30 90
!
address-family ipv4 vrf BLUE
network 10.0.0.0 mask 255.0.0.0
network 10.112.0.0 mask 255.255.0.0
network 169.254.244.0 mask 255.255.255.0
network 172.16.0.0 mask 255.240.0.0
network 192.168.0.0 mask 255.255.0.0
neighbor 10.66.253.5 remote-as 65001
neighbor 10.66.253.5 activate
neighbor 10.66.253.5 route-map all-DC-subnets-in in
neighbor 10.66.253.5 route-map prepend out
exit-address-family
!

OSPF:

!
router ospf 100 vrf WILCORP
router-id 22.22.22.22
capability vrf-lite
area 0 authentication message-digest
redistribute connected
redistribute bgp 65003 route-map all
passive-interface default
no passive-interface Vlan1997
no passive-interface TenGigabitEthernet1/1/1.1
no passive-interface TenGigabitEthernet1/1/2.1
network 10.66.253.33 0.0.0.0 area 0
network 10.66.254.1 0.0.0.0 area 0
network 10.66.254.5 0.0.0.0 area 0
!

route-map:

!
route-map all deny 1
match ip address prefix-list all-DC-subnets
!
route-map all deny 2
match tag 1234
!
route-map all permit 10
match ip address prefix-list all
set tag 1996
!

debug when issue occured:

add null0 route
add the network command
LSA is generated

*Aug 4 04:22:24.500: BGP: Applying map to find origin for 169.254.244.0/24
*Aug 4 04:22:24.501: BGP: Applying map to find origin for 169.254.244.0/24
*Aug 4 04:22:24.501: BGP: Applying map to find origin for 169.254.244.0/24
*Aug 4 04:22:24.501: OSPF-100 LSGEN: Build external LSA 169.254.244.0, mask 255.255.255.0, type 5, age 0, options 0x20, seq 0x80000001
*Aug 4 04:22:24.501: OSPF-100 LSGEN: MTID Metric Metric-type FA Tag Topology Name
*Aug 4 04:22:24.501: OSPF-100 LSGEN: 0 1 2 0.0.0.0 1996 Base

Richard Burts · ‎08-18-2021

There is much we do not know about this situation and perhaps some of that unknown information might change the suggestion that I have. But based on what is described in the original post I suggest that this is what is going on:

- the static route to null0 associates the subnet of the route with an interface (as opposed to a static specifying a next hop which is independent of any interface) and is treated as a locally connected subnet. There are many discussion in the community about this behavior when a static route specifies only an outbound interface and not a next hop.

- now that the subnet is associated with an interface the redistribute connected in OSPF causes the subnet to be redistributed into OSPF.

- The original post says "its not the redistribute connected statement". I wonder on what basis did they determine this?

- The original post says "doesn't occur with any other static routes". I wonder if they tested with other static routes which specify only the outbound interface or tested just with "normal" static routes which specify a next hop?

- the original post says that this behavior occurs on some platforms and does not occur on other platforms. I am surprised at this but accept that it is possible.

- the important thing is that IF you need the static route to null0 and IF you need to redistribute static then you may need a route map to filter out the null0 route.

HTH

Rick

View solution in original post

Giuseppe Larosa · ‎08-18-2021

Hello @robert.gillen ,

you have provided an initial post with many details that describes a different behaviuor between current devices in production running IOS XE and devices tested in the design and simulation phase based on NX-OS ( Nexus switches of some type).

This is not the first or only one thread where diffferences in routing protocols and redistribution are described between IOS/ IOS XE and NX-OS devices.

You have already found a workaround that uses a modified version of the route-map used to filter BGP routes redistribution into OSPF using a route-tag in the static routes and then denying routes having that route tag value.

If we review the more current documentation about redistribution in IPv4 unicast between two dynamic protocols we find that the conditions for prefixes to be passed from protocol A to protocol B are:

a) the prefix is installed in the IP routing table by protocol A ( BGP in your case)

b) a special case exists for connected routes that appear in the routing table as connected routes but they are also matching a network statement.

Now, you have static routes to null0 that match a network statement under router bgp. These network statements inejct the corresponding prefixes in BGP table as locally originated routes with next-hop 0.0.0.0 and weight 38,768.

As noted by @Richard Burts this type of configuration leads these static routes to be treated as similar to connected routes.

What is really interesting in your scenario is that in router OSPF configuration you have also a redistribute connected , but you have fixed the route leakage by changing the route-map applied to BGP into OSPF redistribution with no route-map applied to redistribute connected.

On the other hand, we know that BGP has its own BGP tables or RIB and the locally injected routes / prefixes are part of this and they are considered best path and advertised to BGP peer(s).

So this is probably the "grey zone of implementation": the corresponding routes are installed in IP routing table as static routes with exit interface null0 ( it would be interesting to test using a static route to a physical interface to see if there is any change) but they are also prefixes in the BGP table locally injected and best path advertised to BGP peer(s).

From the BGP configuration point of view you could use aggregate-address ..... summary-only instead of static to null0 + network command, but you would need network commands in BGP for component subnets to trigger the advertising.

I personally prefer to use aggregate-address now instead of the combo static route to null0 + network command.

So I would suggest to try this way to see if you see any changes.

Warning : when using aggregate-address you will need BGP network commands for some component subnets so you may need to filter them in redistribution of BGP into OSPF, in other words the problem can just shift to the component subnets of each aggregate.

Hope to help

Giuseppe

View solution in original post

Richard Burts · ‎08-18-2021

There is much we do not know about this situation and perhaps some of that unknown information might change the suggestion that I have. But based on what is described in the original post I suggest that this is what is going on:

- the static route to null0 associates the subnet of the route with an interface (as opposed to a static specifying a next hop which is independent of any interface) and is treated as a locally connected subnet. There are many discussion in the community about this behavior when a static route specifies only an outbound interface and not a next hop.

- now that the subnet is associated with an interface the redistribute connected in OSPF causes the subnet to be redistributed into OSPF.

- The original post says "its not the redistribute connected statement". I wonder on what basis did they determine this?

- The original post says "doesn't occur with any other static routes". I wonder if they tested with other static routes which specify only the outbound interface or tested just with "normal" static routes which specify a next hop?

- the original post says that this behavior occurs on some platforms and does not occur on other platforms. I am surprised at this but accept that it is possible.

- the important thing is that IF you need the static route to null0 and IF you need to redistribute static then you may need a route map to filter out the null0 route.

HTH

Rick

robert.gillen · ‎08-18-2021

Hey Rick, cheers for the reply.

Interesting points, and to add some info:

- the static route to null0 associates the subnet of the route with an interface (There are many discussion in the community about this behavior when a static route specifies only an outbound interface and not a next hop):
I thought the same, though i tested this using 169.254 addresses (when there were no matching interfaces) and had the same result occur.

- The original post says "its not the redistribute connected statement". I wonder on what basis did they determine this?
This was my original assumption too and logged a TAC case regarding this, Was able to prove it wasn't by (I'm pretty sure)
1 - added a single null0 route without adding advertisement into BGP - no OSPF propagation.
2 - added new loopback interface with 169.254 address - OSPF propagation occurs.

- The original post says "doesn't occur with any other static routes". I wonder if they tested with other static routes which specify only the outbound interface or tested just with "normal" static routes which specify a next hop?

we tested with both a normal ip address destination static route and then the destination being a loopback interface.
Our current WAN IPVPN configs all have this too with no issue.

- the important thing is that IF you need the static route to null0 and IF you need to redistribute static then you may need a route map to filter out the null0 route.

Yeah - no redistribute static has been configured or needed.

Rob

Giuseppe Larosa · ‎08-18-2021