Solved: Re: OSPF Flap - Page 2

Gergely Racz · ‎07-26-2023

Hello Everyone,

Our customer has periodical outages because of OSPF flapping. The flap occures 2-3 times per day randomly. The topology is the following:



Checkpoint Firewall A - - - - - - - - - - - - - - - Vlan7 - - -- - - - - - - -- - Checkpoint Firewall B
|                                                               VRRP VIP:  X.X.X.155                                          |
|                                                                                                                                          |
|                                                                                                                                          |
| Vlan7                                                                                                                                 |Vlan7
|                                                                                                                                          |
|                                                                                                                                          | 
|              X.X.X.156                                                                           X.X.X.157                  |
Cisco Catalyst 4500 L3 Switch A - - - - -- - - Vlan7 - - - - - - --- - - Cisco Catalyst 4500 L3 Switch B

The OSPF flaps via Vlan7 only, and only between the Switches and the FWs. It is stable between the two switches. This is the log message:


*Jul 25 04:40:18.391 CET: %OSPF-5-ADJCHG: Process 65138, Nbr X.X.X.155 on Vlan7 from FULL to DOWN, Neighbor Down: Too many retransmissions
*Jul 25 04:41:18.391 CET: %OSPF-5-ADJCHG: Process 65138, Nbr X.X.X.155 on Vlan7 from DOWN to DOWN, Neighbor Down: Ignore timer expired
*Jul 25 04:41:32.016 CET: %OSPF-5-ADJCHG: Process 65138, Nbr X.X.X.155 on Vlan7 from LOADING to FULL, Loading Done

The FWs are using their VRRP VIP for OSPF. (X.X.X.155)
The Switch A is OSPF DR, Switch B is OSPF BDR.
I only have access to the switches.
The problem is there at least for 2 months
I did several packet captures and OSPF debugs. I noticed the following:

- Either the A or the B switch sends out a multicast LSU containing X, Y, Z LSA.
- The FW and the neighbor switch send an LSAck which contains X, Y, Z subnet
- One of the switches (regardless of which one has sent the original multicast LSU) starts to send unicast LSU containing X, Y, Z LSA to the FW.
- The FW is not ACKing these unwanted / unnecessary unicast LSUs, I guess they see it as malicious traffic
- After the switch has sent 25 unicast LSUs, and missed 25 LSAcks, it deletes the OSPF neighbor towards the FW.

config:

Switch-A#sh run int vl7
Building configuration...

Current configuration : 469 bytes
!
interface Vlan7
description ** VLAN 7 **
ip vrf forwarding INTERNAL
ip address X.X.X.156 255.255.255.224
no ip redirects
no ip proxy-arp
standby 37 ip X.X.X.158
standby 37 priority 105
standby 37 preempt
standby 37 authentication md5 (...)
ip ospf authentication message-digest
ip ospf message-digest-key 5 md5 7 (...)
ip ospf priority 255
ip ospf lls disable
ip ospf bfd disable
load-interval 30
end

Switch-A#
Switch-A#sh run | s router ospf
router ospf 65138 vrf INTERNAL
router-id X.X.X.35
auto-cost reference-bandwidth 10000
redistribute connected subnets route-map CONNECTED-TO-OSPF-INTERNAL
redistribute static subnets route-map STATIC-TO-OSPF-INTERNAL
redistribute bgp (...) subnets route-map BGP-TO-OSPF-INTERNAL
passive-interface default
no passive-interface Vlan7
network X.X.X.128 0.0.0.31 area 0
distribute-list route-map (...) in
(...)
Switch-A#










Switch-B#sh run int vl7
Building configuration...

Current configuration : 444 bytes
!
interface Vlan7
description ** VLAN 7 **
ip vrf forwarding INTERNAL
ip address X.X.X.157 255.255.255.224
no ip redirects
no ip proxy-arp
standby 37 ip X.X.X.158
standby 37 preempt
standby 37 authentication md5 (...)
ip ospf authentication message-digest
ip ospf message-digest-key 5 md5 7 (...)
ip ospf priority 254
ip ospf lls disable
ip ospf bfd disable
load-interval 30
end

Switch-B#sh run | s router ospf
router ospf 65138 vrf INTERNAL
router-id X.X.X.36
auto-cost reference-bandwidth 10000
redistribute connected subnets route-map CONNECTED-TO-OSPF-INTERNAL
redistribute static subnets route-map STATIC-TO-OSPF-INTERNAL
redistribute bgp (...) subnets route-map BGP-TO-OSPF-INTERNAL
passive-interface default
no passive-interface Vlan7
network X.X.X.128 0.0.0.31 area 0
distribute-list route-map (...) in
(...)
Switch-B#

What I already ruled out:
- there is no mac-move happening on the switches
- STP topology changes are happening much less frequently (1x per 2-3 weeks)
- there is no OSPF checksum error in the

show IP

traffic output
- we disabled LLS
- we disabled BFD (on the FWs, too)
- CPU utilization is normal
- no errors on the interfaces
- there is no MTU mismatch between the switches - FWs
- The switches and FWs were already rebooted, and we did OS upgrade on every device

So it seems like to me that when the issue happens, one the switches is unable to process the LSAck for some reason.

Do you have any idea / suggestion? If you need I can share the pcap / OSPF debug outputs, too.

Thanks in advance.

MHM Cisco World · ‎07-30-2023

Sorry late reply' It hot summer and I take vacation see peaky blinders all days (dont have time to see it before)

Anyway' large number of lsa that really issue.

I think that there is mtu mismatch but even so this not explain large numbers of lsa update.

Then lamp in my brain light' these lsa can be from redistrubte connect under ospf' these connect is vpn /32 host.

When vpn connect the new /32 add to rib and opsf send lsa to all neighbor about this new lsa and image there is more than 1000 vpn and active and inactive this make ospf crazy.

One lsa for each vpn...and cpu high utilize

Solution add area range and filter lsa/32 this make ospf advertise only range and not each /32

Peter Paluch · ‎07-30-2023

MHM and everyone,

A brief update - I hope @Gergely Racz won't mind:

- We have been privately analyzing packet captures and OSPF debugs made on SwA and SwB during the time of the issue.

- The amount of LSUs is not a problem on its own - as Gergely mentioned earlier in this thread, they redistribute BGP into OSPF so this results into many Type 5 LSAs which get updated frequently due to the churn in BGP. Once again, the count of LSAs and the volume of LSUs is not seen by us as a problem on its own for now.

- After receiving the multicast LSUs from SwA/SwB, the Checkpoint firewall correctly sends so-called delayed LSAck packets to 224.0.0.6 as requested by RFC 2328 Section 13.5.

- Both switches receive these LSAcks from the firewall and pass them to the supervisor and the OSPF process in IOS.

- Randomly, one of the switches fails to completely process a received LSAck in the OSPF process. The OSPF process reports the LSAck's arrival in debugs and starts working on it but it does not completely process all its contents, only a few initial LSA headers from it. As of now, we have no explanation for that. As a result, the unprocessed LSA headers from the received LSAck are considered unacknowledged and the switch will eventually start retransmitting them to the firewall in unicast. Do note that the other switch processes the same LSAck fully, hence the LSAck itself is not corrupt. This is a misbehavior on the Cisco part.

- The firewall silently ignores the unicast retransmissions, eventually causing the retransmitting switch to give up and destroy the adjacency. This is a misbehavior on the CheckPoint part.

We are currently deciding where to take it from here.

Best regards,
Peter

MHM Cisco World · ‎07-30-2023

if the prefix is stable it dont matter how much lsa, but if as you mention BGP into OSPF wiyhout filter then any prefix flapping or any VPN add remove the OSPF send LSA this huge number sure make CPU high utilize and in end the ospf send lsa update dont receive ack and loop...
he need to filter that and for core it need only default not need all BGP table from FW.

Peter Paluch · ‎07-30-2023

MHM,

Respectfully but firmly, no. I disagree fully with you here. Your suggestions might make sense on a very general level but they are not relevant to the issue at hand.

I made it clear what are the two obvious misbehaviors: First, IOS OSPF process getting interrupted in the middle of processing a properly received LSAck packet and never returning to it; second, CheckPoint silently ignoring unicast LSU packets.

Best regards,
Peter

Gergely Racz · ‎07-31-2023

I've checked the CPU utilization many times, especially when i was running EPC + OSPF debugs at the same time. The CPU was never higher than 50-60%.

MHM Cisco World · ‎07-31-2023

Just try reduce the ospf database.

Gergely Racz · ‎08-01-2023

I will try that and some other desperation actions after the customer gives me the green light. I will let you know about the outcome.

Peter Paluch · ‎07-26-2023

@Gergely Racz ,

I understand that the data in the PCAP is sensitive. I also do not want to hijack this investigation - everyone here may have valuable observations to contribute.

Still, I would very much like to analyze the true PCAP, not the screenshots (and I'm sorry, they're not really intelligible, but it's this website making them unreadable). Is there a way you could share that file privately? As I am still an internal Cisco employee and an ex-TAC, I am both bound and used to handle such shared data confidentially. Of course, I understand if you decide not to.

Best regards,
Peter

Gergely Racz · ‎07-26-2023

If you download the images, then you will be able to zoom

Peter Paluch · ‎08-29-2023

Hi everyone,

With Gergely, we've been communicating directly and I got also looped into the TAC case that was raised for this issue, leading the investigation. Following Gergely's permission, I'd like to share the explanation we arrived to.

The root cause of the problem was the fact that the input traffic rules on the Checkpoint firewall were misconfigured - they did not allow unicast OSPF LSU packets (retransmissions) to be accepted. Hence, when either of the Catalyst switches started sending unicast LSU retransmissions to the firewall, it would silently drop them, causing the switch to give up after 25 retransmissions and drop the adjacency.

However, the reason why the Catalyst switches engaged into the unicast retransmissions was much more involved. Analyzing the packet captures performed on the Catalyst switch supervisors (using the "monitor capture ... control-plane both"), we could see that the firewall successfully acknowledged all LSAs flooded from the switches as multicasts, and the corresponding LSAck packets from the firewall were all received by both switches properly. There was no apparent reason for the switches to start the unicast retransmissions because all LSAs flooded from the switches were acknowledged by the firewall and no LSAck packets were lost. The majority of our investigation focused on explaining these apparently unjustified retransmissions initiated at random moments by either of the Catalyst switches.

Eventually, the behavior of the switches was identified as expected. In 2011, tracked in CSCtu25818 for OSPFv2 and CSCtx15575 for OSPFv3, Cisco implemented a scalability enhancement to the IOS OSPF implementation to prevent retransmissions caused by late processing of LSAck packets. The gist of this enhancement is to have the IOS OSPF router process put the incoming unprocessed non-hello OSPF packets into two software queues: LSAcks go into one queue (called high priority queue) while DBDs, LSRs and LSUs go into the other queue (called low priority queue). Whenever the IOS OSPF process dequeues packets, it always first dequeues packets from the high priority queue that exclusively holds LSAcks. Only if the high priority queue is empty, the IOS OSPF process dequeues the low priority queue. Essentially, for the non-hello packets, the IOS OSPF router process always processes incoming LSAcks first, and only processes other incoming packets if there are no more LSAcks to process. This is not just a proprietary quirk: RFC 4222 specifically endorses prioritizing the processing of incoming Hello and LSAck packets over the rest of OSPF packets.

This behavior can lead to an interesting scenario which was indeed occurring in Gergely's customer's network:

- The switch A (DR or BDR) bursts out a large amount of LSUs to 224.0.0.5

- Both the switch B (BDR or DR) and the firewall receive these LSUs.

- The firewall promptly sends back a number of LSAcks to 224.0.0.6, acknowledging the LSAs received in the LSUs.

- Switch B may have processed a few of the received LSUs in the meantime, but after receiving the burst of LSAcks from the firewall, it will process these first as they are placed into the high priority queue. Only after there are no more LSAcks to process, it will continue processing the remaining LSUs from the low priority queue.

- As a result, some of the LSAs received and acknowledged by the firewall are not yet installed in the LSDB of switch B and put on the retransmit queues for OSPF neighbors when the LSAcks from the firewall are processed; these LSAs are still waiting in the low priority queue on switch B.

- Consequently, when switch B processes LSAcks sent by the firewall, LSAs that already are in the LSDB on switch B will be removed from the retransmit list for the firewall if they are on the list; for the rest of the acknowledged LSAs that are not in the switch B's LSDB, switch B does nothing.

- After switch B processes all received LSAcks, it returns to the low priority queue and processes the remaining received LSUs. The LSAs in these LSUs will be installed into the LSDB and put on the retransmit list for the firewall as usual (but not flooded to the firewall again because switch A as the DR/BDR did the flooding already). As a result of putting these LSAs on the retransmit list, switch B will wait for acknowledgements from the firewall – but the firewall already sent them and so has no reason to send them again; it does not even know that switch B waits for the acknowledgements.

- Consequently, switch B will eventually start the unicast retransmissions toward the firewall – and since the firewall drops the unicast OSPF retransmissions, switch B will eventually give up and drop the adjacency.

This completely explains the reason why either of the Catalyst switches - based on event timing - started the unicast retransmissions despite the firewall acknowledging all received LSAs. Correcting the input rules on the firewall to permit incoming unicast OSPF LSU packets resolved the issue, and we know that the retransmissions from the Catalyst switches are justified and not a result of a misbehavior.

As a side note, as we had to deal with large amounts of debug outputs, the following configuration was very useful on the Catalyst switches as it

1) prepends every logging message with a monotonously increasing sequence number allowing to identify any lost or rate-limited logging message

2) enables persistent logging of system messages into files in the bootflash:

3) increases the size of the internal memory buffer to hold logging messages before they are sent to their individual logging destinations to prevent losing logging messages due to overfilling this internal buffer

4) increases the rate limit for the logging messages to prevent losing logging messages due to rate limiting

configure terminal
service sequence-numbers
logging persistent filesize 1048576 immediate url bootflash:/logs
logging queue-limit 100000
logging rate-limit 7000
end

In fact, at the beginning of our investigation, we were losing debugging messages in the logs and that has misled me into believing that the Catalyst switches were incompletely processing received LSAcks. In reality, the received LSAcks were always processed completely but the debug messages were rate-limited which made the impression that the processing of certain LSAcks was only partial. That was a red herring.

Hopefully this will be of interest and help to anyone who might be facing similar issues or spurious apparently inexplicable retransmissions in IOS/IOS XE OSPF.

Best regards,
Peter

David Ruess · ‎08-29-2023

@Peter Paluch

A phenomenal and well executed solution statement with great detail as always. If I could give this more points/likes I would. Thank you for not only diving very deep into this issue but providing the explanation that surpasses expectations (at least mine). You truly are an asset to this forum and networking in general. Always a pleasure reading your work and analysis of issues here in the community. Please don't stop anytime soon. Your contributions are appreciated.

Hope you're having a wonderful week!

-David

Peter Paluch · ‎08-30-2023

@David Ruess ,

You are incredibly kind and supportive. Thank you, sincerely!

Best regards,
Peter

Gergely Racz · ‎08-30-2023

Hello Peter,

I just want to say thank you, here in this platform as well. I really doubt we could have ever solve this without your help.

BR,

Gergely

Peter Paluch · ‎08-30-2023

@Gergely Racz ,

I thoroughly enjoyed helping with this issue! It's been a while since I have been so thrilled about resolving a mysterious behavior in a fundamental networking protocol. And on top of it, I have had the pleasure of getting to know you and working together with you which made the entire experience so much more valuable for me.

Thank you!

Best regards,
Peter

David Samuel Penaloza Seijas · ‎08-30-2023

As always, the level of detail is flabbergasting, and yet, exquisite. Peter, please don't stop being the inquisitive mind you are, these are the discussions we always need here.

Thank you for such a display of camaraderie, perseverance, dedication and expertise.