cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
683
Views
2
Helpful
3
Replies

BFD Session Drops

Verbatim
Level 1
Level 1

 

Hello,

 

We are experiencing BFD session drops for a neighbor, reason ECHO FAILURE. The interface connecting to the ISP router is configured with a QoS service-policy, and also with CBWFQ bandwidth 200000. There is a subinterface which is configured with a VRF. The neighbor is the ISP router with IP on the subinterface VRF.

 

The session drops tend to take place at the same time of day on each business day for the site- around 9am, when everyone has started work. The QoS policy shoes drops on the interface at these times. This has led to the suspicion that the BFD traffic is getting squeezed out and dropped, causing the session flaps.

 

Additionally, the ISP has indicated that there is a 10m bandwidth limit on the VRFs, which does not seem to be accounted for by our QoS configuration (no QoS policy is configured on the subinterface, only the parent interface, and there is no reference to the 10m anywhere).

 

Would these conditions result in BFD session flaps? If so, how should we go about changing the configuration to account for the ISP’s 10m limit?

 

Config excerpts (sanitized):

 

 

 

 

interface GigabitEthernet0/0/0
 mtu 9000
 bandwidth 200000
 no ip address
 no negotiation auto
 service-policy output QOS-WAN-200MB
!
interface GigabitEthernet0/0/0.410
 encapsulation dot1Q 410
 vrf forwarding XXXXX
 ip flow monitor MONITOR_IPV4 input
 ip address 192.168.1.1 255.255.255.252
 no ip redirects
 no ip proxy-arp
 ipv6 flow monitor MONITOR_IPV6 input
 bfd interval 999 min_rx 999 multiplier 3
!

policy-map QOS-WAN-200MB
 class class-default
  shape average 200000000   
   service-policy QOS-WAN-ASR-1G

policy-map QOS-WAN-ASR-1G
 class QOS-PRIORITY
  priority percent 20
 class QOS-REAL-TIME
  bandwidth remaining percent 20 
  queue-limit 125 packets
 class QOS-SIGNALING
  bandwidth remaining percent 20 
  queue-limit 125 packets
 class QOS-TIME-SENSITIVE
  bandwidth remaining percent 10 
  queue-limit 125 packets
 class QOS-BULK
  bandwidth remaining percent 1 
  queue-limit 125 packets
 class class-default
  bandwidth remaining percent 49 
  queue-limit 4000 packets

 

 

 

 

1 Accepted Solution

Accepted Solutions

Joseph W. Doherty
Hall of Fame
Hall of Fame

If provider is only providing a particular subset of traffic,10 Mbps, within your overall 200 Mbps, that can be a problem.

On some platforms, I've been able to do nested shaping, such that there's shaping per "subinterface" and shaping for the physical interface's aggregate.  Again, this capability varies much per platform.

Possibly(?):

interface GigabitEthernet0/0/0
mtu 9000
bandwidth 200000
no ip address
no negotiation auto
service-policy output QOS-WAN-200MB

interface GigabitEthernet0/0/0.410
encapsulation dot1Q 410
vrf forwarding XXXXX
ip flow monitor MONITOR_IPV4 input
ip address 192.168.1.1 255.255.255.252
no ip redirects
no ip proxy-arp
ipv6 flow monitor MONITOR_IPV6 input
bfd interval 999 min_rx 999 multiplier 3
service-policy output QOS-WAN-10MB

policy-map QOS-WAN-10MB
class class-default
shape average 10000000 !also, as noted below, might reduce by 15%, 8500000
service-policy QOS-WAN-ASR-1G

policy-map QOS-WAN-200MB
class class-default
shape average 200000000 !also, as noted below, might reduce by 15%, 170000000
service-policy QOS-WAN-ASR-1G

The first issue that may be an issue, on many Cisco platforms, I suspect a CBWFQ shaper or policer doesn't account for L2 overhead.  If not, and if your provider is effectively limited bandwidth to "real" 200 or 10 Mbps, your shaper is oversubscribing bandwidth.

The forgoing, if an issue, I've found, can often be mitigated by using a limit about 15% less than the nominal bandwidth.

BTW, I don't know your actual traffic mix, and I see you also are using jumbo Ethernet.  In any case, 15% is a ballpark number, your average overhead might vary.

Second, unclear where you're directing your BFD packets.  Possibly class-default, which is FIFO.  If BFD is going there, it's possibly not the best place for it.

So, you might also want to preclude FIFO issues, elevate BFD's priority, or bypass the shaper, altogether.  Most of these would require identification of BFD packets.

preclude FIFO issues:

class class-default
bandwidth remaining percent 49
fair-queue !NB: flows are directed to a pool of flow queues, i.e. more than one flow can be using a flow queue
queue-limit 4000 packets

elevate BFD's priority:

I.e. direct BFD to class QOS-REAL-TIME or class QOS-PRIORITY

bypass the shaper, altogether:

policy-map QOS-WAN-10MB
class BFD
priority 8000 !minimum value? enough for BFD?
class class-default
shape average 10000000
service-policy QOS-WAN-ASR-1G

policy-map QOS-WAN-200MB
class BFD
priority 8000 !minimum value? enough for BFD?
class class-default
shape average 200000000
service-policy QOS-WAN-ASR-1G

Also BTW, you can "oversubscribe" a provider's shaper or policer (or a physical downstream interface with that physical bandwidth), because both of Bc/Be's and/or your Tc's aren't aligned.  Another reason to run a little slower to avoid some of such issues.

View solution in original post

3 Replies 3

Joseph W. Doherty
Hall of Fame
Hall of Fame

If provider is only providing a particular subset of traffic,10 Mbps, within your overall 200 Mbps, that can be a problem.

On some platforms, I've been able to do nested shaping, such that there's shaping per "subinterface" and shaping for the physical interface's aggregate.  Again, this capability varies much per platform.

Possibly(?):

interface GigabitEthernet0/0/0
mtu 9000
bandwidth 200000
no ip address
no negotiation auto
service-policy output QOS-WAN-200MB

interface GigabitEthernet0/0/0.410
encapsulation dot1Q 410
vrf forwarding XXXXX
ip flow monitor MONITOR_IPV4 input
ip address 192.168.1.1 255.255.255.252
no ip redirects
no ip proxy-arp
ipv6 flow monitor MONITOR_IPV6 input
bfd interval 999 min_rx 999 multiplier 3
service-policy output QOS-WAN-10MB

policy-map QOS-WAN-10MB
class class-default
shape average 10000000 !also, as noted below, might reduce by 15%, 8500000
service-policy QOS-WAN-ASR-1G

policy-map QOS-WAN-200MB
class class-default
shape average 200000000 !also, as noted below, might reduce by 15%, 170000000
service-policy QOS-WAN-ASR-1G

The first issue that may be an issue, on many Cisco platforms, I suspect a CBWFQ shaper or policer doesn't account for L2 overhead.  If not, and if your provider is effectively limited bandwidth to "real" 200 or 10 Mbps, your shaper is oversubscribing bandwidth.

The forgoing, if an issue, I've found, can often be mitigated by using a limit about 15% less than the nominal bandwidth.

BTW, I don't know your actual traffic mix, and I see you also are using jumbo Ethernet.  In any case, 15% is a ballpark number, your average overhead might vary.

Second, unclear where you're directing your BFD packets.  Possibly class-default, which is FIFO.  If BFD is going there, it's possibly not the best place for it.

So, you might also want to preclude FIFO issues, elevate BFD's priority, or bypass the shaper, altogether.  Most of these would require identification of BFD packets.

preclude FIFO issues:

class class-default
bandwidth remaining percent 49
fair-queue !NB: flows are directed to a pool of flow queues, i.e. more than one flow can be using a flow queue
queue-limit 4000 packets

elevate BFD's priority:

I.e. direct BFD to class QOS-REAL-TIME or class QOS-PRIORITY

bypass the shaper, altogether:

policy-map QOS-WAN-10MB
class BFD
priority 8000 !minimum value? enough for BFD?
class class-default
shape average 10000000
service-policy QOS-WAN-ASR-1G

policy-map QOS-WAN-200MB
class BFD
priority 8000 !minimum value? enough for BFD?
class class-default
shape average 200000000
service-policy QOS-WAN-ASR-1G

Also BTW, you can "oversubscribe" a provider's shaper or policer (or a physical downstream interface with that physical bandwidth), because both of Bc/Be's and/or your Tc's aren't aligned.  Another reason to run a little slower to avoid some of such issues.

Joseph W. Doherty
Hall of Fame
Hall of Fame

@Verbatim I see you marked my reply as a solution.  Just curious, if you feel like telling, what did you do and what was the effect?

There are documents / procedures to follow before I can actually work on this (earliest to start would be 9/1/23). I've started the process, intending to apply QOS-WAN-10MB on the interface, and set the limit to accommodate the 15% space. We'll see how that goes for a few business days, and then decide on next steps (like the BFD prioritization, etc).

I didn't want to let this sit for days with no activity. If an issue comes up with the attempt, I can post back here or just start another thread (and reference this one).