I have 2 sites which each have a pair of Cat9500-32C switches in virtual stackwise as core switches and are connected together with 1x 10G L2 and 2x 1G L2 fibre (essentially dark fibre) circuits. We have configured the links between the sites as L3 and have EIGRP adjacencies between the two cores. Prefixes use the 10G and appear as feasible successors via the 1G links as expected. Due to the circuit providers not forwarding link loss events, we had a gold partner implement BFD, and this is using 100,100,3 as the values and of course, the BFD interfaces are added to the EIGRP config.
Yesterday, we saw a number of BFD link failure events across multiple links. I'm confident that the actual circuits did not fail, so we believe this is a result of the BFD timers being too aggressive. The C9500 are ticking over and never go above 6% cpu utilisation.
According to the IOS XE guide, the Cat 9500 high-performance models should be using a minimum of 750 to prevent a BFD failure in the event of a redundancy switchover. However I also note that BFD traffic falls into class-default, so I am guessing maybe there was a microburst and the BFD packets were QoS dropped? We are using out-of-the box QoS config on the Cat 9500s and no softmax multipliers.
Can anyone point me towards any documentation or experience around calculating what BFD timers we should use?
Hi BB, yes, that was the document I saw. In fact, they are L3 interfaces so in my original post it should read 250 not 750ms.
In the case of the 10G link, this is what I'm showing:
C9500-32C#sh run int hu1/0/13
interface HundredGigE1/0/13 no switchport ip address 192.168.254.248 255.255.255.254 no ip redirects no ip unreachables bfd interval 100 min_rx 100 multiplier 3
C9500-32C#sh bfd neighbors interface hu1/0/13 details
NeighAddr LD/RD RH/RS State Int
192.168.254.249 7/12 Up Up Hu1/0/13
Session state is UP and using echo function with 100 ms interval.
Session Host: Software
Local Diag: 0, Demand mode: 0, Poll bit: 0
MinTxInt: 1000000, MinRxInt: 1000000, Multiplier: 3
Received MinRxInt: 1000000, Received Multiplier: 3
Holddown (hits): 0(0), Hello (hits): 1000(94611)
Rx Count: 94666, Rx Interval (ms) min/max/avg: 1/2044/876 last: 159 ms ago
Tx Count: 94646, Tx Interval (ms) min/max/avg: 1/1005/878 last: 698 ms ago
Elapsed time watermarks: 0 0 (last: 0)
Registered protocols: CEF EIGRP
Last packet: Version: 1 - Diagnostic: 0
State bit: Up - Demand bit: 0
Poll bit: 0 - Final bit: 0
C bit: 0
Multiplier: 3 - Length: 24
My Discr.: 12 - Your Discr.: 7
Min tx interval: 1000000 - Min rx interval: 1000000
Min Echo interval: 100000
[Snip multiple occurrences of the below throughout the day] May 26 15:18:17.311: %DUAL-5-NBRCHANGE: EIGRP-IPv4 500: Neighbor 192.168.254.249 (HundredGigE1/0/13) is up: new adjacency May 26 15:18:17.311: %BFD-6-BFD_SESS_CREATED: BFD-SYSLOG: bfd_session_created, neigh 192.168.254.249 proc:EIGRP, idb:HundredGigE1/0/13 handle:2 act May 26 15:25:30.377: %DUAL-5-NBRCHANGE: EIGRP-IPv4 500: Neighbor 192.168.254.249 (HundredGigE1/0/13) is down: BFD peer down notified May 26 15:25:34.978: %DUAL-5-NBRCHANGE: EIGRP-IPv4 500: Neighbor 192.168.254.249 (HundredGigE1/0/13) is up: new adjacency May 26 15:25:34.979: %BFD-6-BFD_SESS_CREATED: BFD-SYSLOG: bfd_session_created, neigh 192.168.254.249 proc:EIGRP, idb:HundredGigE1/0/13 handle:2 act
Listen: https://smarturl.it/CCRS8E37Follow us: twitter.com/ciscochampionSometimes, situations require temporary fixes. Sometimes, the network becomes an afterthought in overall office design and planning. In either situation, it may require netw...
In this special edition of the Insider Series, we hear from Cisco partners who have taken steps to be more eco-friendly and sustainable. We hear what inspires ASHRAE, Southwire, Igor, and NTT to create a workplace that is centered around people and how th...
We know that the Type-1 LSA describes the link type connected to the router, the neighbor router and the subnet number.In this topology, assume we dont have a Type-2 LSA, so each router will create its own Type-1 LSA, the Type-1 LSA will describe the neig...
Here are some commonly asked questions and answers to help with your adoption of Cisco DNA Center Wireless. Subscribe to this post to stay up-to-date with the latest Q&A and recommended Ask the Experts (ATXs) sessions to attend.
Q. I have a Cisco Appl...
Why IETF changed and inverted OSPF Type-7 LSA VS Type-5 LSA election In RFC 3101 compared to OLD RFC 1587?Many people learns that the Type-7 LSA and Type-5 election (ON Versus OE routes) depends on RFC 3101 for NSSA published in 2003 and RFC 1587 for NSSA...