Looking explanation on weird OSPF behavior - Page 2

thomas-ravail · ‎11-24-2024

Hi Everyone,

A weird routing behavior happened between 2 routers and I'm looking now at some explanation to understand. We configured OSPF (broadcast) between 2 routers (R1 and R2) with a redistribution of a default route from BGP to OSPF on R1. R2 was advertising as well some routes.

For a project, we applied an access-list on R1, on the interface facing R2. This ACL contained, by mistake, the interconnection IP subnet used between R1 and R2 meaning Unicast packets were dropped from R2 to R1. This interconnection subnet should have been excluded from the ACL but that was a mistake. The OSPF neighboring didn't went down because the HELLO packets were sent via Multicast to 224.0.0.5. After 35hours, the neighboring was still UP but none of the OSPF routes were announced anymore (causing an incident)..

We performed a shut/no shut on one interface and the OSPF stayed DOWN (due to the ACL dropping the unicast packets). it went UP immediately after removing the ACL and the routes were exchanged.

--> We're now trying to understand why the OSPF routes disapeared from the routing table after this huge delay while the OSPF protocol was UP. When I reproduce it in GNS3, all HELLO packets as well as LS_update and LS_Acknowledge (when doing new advertisement) are in Multicast and not in Unicast..

Thanks for your help.

paul driver · ‎11-26-2024

Hello

@thomas-ravail wrote:
Yes, couldn't agree more with your analysis, but the delay between the time we applied the ACL and the incident is huge (35 hours)

Nov 21 06:14:04.115 GMT: %OSPF-5-ADJCHG: Process 100, Nbr X.X.X.X on GigabitEthernet0/0/0 from EXSTART to DOWN, Neighbor Down: Too many retransmissions
Nov 21 06:15:04.115 GMT: %OSPF-5-ADJCHG: Process 100, Nbr X.X.X.X on GigabitEthernet0/0/0 from DOWN to DOWN,

Based on these log outputs, the OSPF neighboring was trying to establish the adjacency but on both side, OSPF status was UP (have been confirmed many times during the incident).

This suggest the OSPF adjacency had already been lost due to some event (ie: flapping interface etc. ) and its then at that point your routes were withdrawn which would also mean your applied acl wasn’t as restrictive that is still allowed the LSAs to be refreshed upto the time the ospf adjacency was torn down, At that point the ospf adjacency tried to restablish and got stuck in the exatart state due to unicast restriction of your applied interface acl

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul