08-24-2023 02:43 PM
Hello,
We are experiencing BFD session drops for a neighbor, reason ECHO FAILURE. The interface connecting to the ISP router is configured with a QoS service-policy, and also with CBWFQ bandwidth 200000. There is a subinterface which is configured with a VRF. The neighbor is the ISP router with IP on the subinterface VRF.
The session drops tend to take place at the same time of day on each business day for the site- around 9am, when everyone has started work. The QoS policy shoes drops on the interface at these times. This has led to the suspicion that the BFD traffic is getting squeezed out and dropped, causing the session flaps.
Additionally, the ISP has indicated that there is a 10m bandwidth limit on the VRFs, which does not seem to be accounted for by our QoS configuration (no QoS policy is configured on the subinterface, only the parent interface, and there is no reference to the 10m anywhere).
Would these conditions result in BFD session flaps? If so, how should we go about changing the configuration to account for the ISP’s 10m limit?
Config excerpts (sanitized):
interface GigabitEthernet0/0/0
mtu 9000
bandwidth 200000
no ip address
no negotiation auto
service-policy output QOS-WAN-200MB
!
interface GigabitEthernet0/0/0.410
encapsulation dot1Q 410
vrf forwarding XXXXX
ip flow monitor MONITOR_IPV4 input
ip address 192.168.1.1 255.255.255.252
no ip redirects
no ip proxy-arp
ipv6 flow monitor MONITOR_IPV6 input
bfd interval 999 min_rx 999 multiplier 3
!
policy-map QOS-WAN-200MB
class class-default
shape average 200000000
service-policy QOS-WAN-ASR-1G
policy-map QOS-WAN-ASR-1G
class QOS-PRIORITY
priority percent 20
class QOS-REAL-TIME
bandwidth remaining percent 20
queue-limit 125 packets
class QOS-SIGNALING
bandwidth remaining percent 20
queue-limit 125 packets
class QOS-TIME-SENSITIVE
bandwidth remaining percent 10
queue-limit 125 packets
class QOS-BULK
bandwidth remaining percent 1
queue-limit 125 packets
class class-default
bandwidth remaining percent 49
queue-limit 4000 packets
Solved! Go to Solution.
08-24-2023 04:47 PM - edited 08-24-2023 04:58 PM
If provider is only providing a particular subset of traffic,10 Mbps, within your overall 200 Mbps, that can be a problem.
On some platforms, I've been able to do nested shaping, such that there's shaping per "subinterface" and shaping for the physical interface's aggregate. Again, this capability varies much per platform.
Possibly(?):
interface GigabitEthernet0/0/0
mtu 9000
bandwidth 200000
no ip address
no negotiation auto
service-policy output QOS-WAN-200MB
interface GigabitEthernet0/0/0.410
encapsulation dot1Q 410
vrf forwarding XXXXX
ip flow monitor MONITOR_IPV4 input
ip address 192.168.1.1 255.255.255.252
no ip redirects
no ip proxy-arp
ipv6 flow monitor MONITOR_IPV6 input
bfd interval 999 min_rx 999 multiplier 3
service-policy output QOS-WAN-10MB
policy-map QOS-WAN-10MB
class class-default
shape average 10000000 !also, as noted below, might reduce by 15%, 8500000
service-policy QOS-WAN-ASR-1G
policy-map QOS-WAN-200MB
class class-default
shape average 200000000 !also, as noted below, might reduce by 15%, 170000000
service-policy QOS-WAN-ASR-1G
The first issue that may be an issue, on many Cisco platforms, I suspect a CBWFQ shaper or policer doesn't account for L2 overhead. If not, and if your provider is effectively limited bandwidth to "real" 200 or 10 Mbps, your shaper is oversubscribing bandwidth.
The forgoing, if an issue, I've found, can often be mitigated by using a limit about 15% less than the nominal bandwidth.
BTW, I don't know your actual traffic mix, and I see you also are using jumbo Ethernet. In any case, 15% is a ballpark number, your average overhead might vary.
Second, unclear where you're directing your BFD packets. Possibly class-default, which is FIFO. If BFD is going there, it's possibly not the best place for it.
So, you might also want to preclude FIFO issues, elevate BFD's priority, or bypass the shaper, altogether. Most of these would require identification of BFD packets.
preclude FIFO issues:
class class-default
bandwidth remaining percent 49
fair-queue !NB: flows are directed to a pool of flow queues, i.e. more than one flow can be using a flow queue
queue-limit 4000 packets
elevate BFD's priority:
I.e. direct BFD to class QOS-REAL-TIME or class QOS-PRIORITY
bypass the shaper, altogether:
policy-map QOS-WAN-10MB
class BFD
priority 8000 !minimum value? enough for BFD?
class class-default
shape average 10000000
service-policy QOS-WAN-ASR-1G
policy-map QOS-WAN-200MB
class BFD
priority 8000 !minimum value? enough for BFD?
class class-default
shape average 200000000
service-policy QOS-WAN-ASR-1G
Also BTW, you can "oversubscribe" a provider's shaper or policer (or a physical downstream interface with that physical bandwidth), because both of Bc/Be's and/or your Tc's aren't aligned. Another reason to run a little slower to avoid some of such issues.
08-24-2023 04:47 PM - edited 08-24-2023 04:58 PM
If provider is only providing a particular subset of traffic,10 Mbps, within your overall 200 Mbps, that can be a problem.
On some platforms, I've been able to do nested shaping, such that there's shaping per "subinterface" and shaping for the physical interface's aggregate. Again, this capability varies much per platform.
Possibly(?):
interface GigabitEthernet0/0/0
mtu 9000
bandwidth 200000
no ip address
no negotiation auto
service-policy output QOS-WAN-200MB
interface GigabitEthernet0/0/0.410
encapsulation dot1Q 410
vrf forwarding XXXXX
ip flow monitor MONITOR_IPV4 input
ip address 192.168.1.1 255.255.255.252
no ip redirects
no ip proxy-arp
ipv6 flow monitor MONITOR_IPV6 input
bfd interval 999 min_rx 999 multiplier 3
service-policy output QOS-WAN-10MB
policy-map QOS-WAN-10MB
class class-default
shape average 10000000 !also, as noted below, might reduce by 15%, 8500000
service-policy QOS-WAN-ASR-1G
policy-map QOS-WAN-200MB
class class-default
shape average 200000000 !also, as noted below, might reduce by 15%, 170000000
service-policy QOS-WAN-ASR-1G
The first issue that may be an issue, on many Cisco platforms, I suspect a CBWFQ shaper or policer doesn't account for L2 overhead. If not, and if your provider is effectively limited bandwidth to "real" 200 or 10 Mbps, your shaper is oversubscribing bandwidth.
The forgoing, if an issue, I've found, can often be mitigated by using a limit about 15% less than the nominal bandwidth.
BTW, I don't know your actual traffic mix, and I see you also are using jumbo Ethernet. In any case, 15% is a ballpark number, your average overhead might vary.
Second, unclear where you're directing your BFD packets. Possibly class-default, which is FIFO. If BFD is going there, it's possibly not the best place for it.
So, you might also want to preclude FIFO issues, elevate BFD's priority, or bypass the shaper, altogether. Most of these would require identification of BFD packets.
preclude FIFO issues:
class class-default
bandwidth remaining percent 49
fair-queue !NB: flows are directed to a pool of flow queues, i.e. more than one flow can be using a flow queue
queue-limit 4000 packets
elevate BFD's priority:
I.e. direct BFD to class QOS-REAL-TIME or class QOS-PRIORITY
bypass the shaper, altogether:
policy-map QOS-WAN-10MB
class BFD
priority 8000 !minimum value? enough for BFD?
class class-default
shape average 10000000
service-policy QOS-WAN-ASR-1G
policy-map QOS-WAN-200MB
class BFD
priority 8000 !minimum value? enough for BFD?
class class-default
shape average 200000000
service-policy QOS-WAN-ASR-1G
Also BTW, you can "oversubscribe" a provider's shaper or policer (or a physical downstream interface with that physical bandwidth), because both of Bc/Be's and/or your Tc's aren't aligned. Another reason to run a little slower to avoid some of such issues.
08-26-2023 04:07 PM - edited 08-26-2023 04:07 PM
@Verbatim I see you marked my reply as a solution. Just curious, if you feel like telling, what did you do and what was the effect?
08-29-2023 02:24 PM - edited 08-29-2023 02:24 PM
There are documents / procedures to follow before I can actually work on this (earliest to start would be 9/1/23). I've started the process, intending to apply QOS-WAN-10MB on the interface, and set the limit to accommodate the 15% space. We'll see how that goes for a few business days, and then decide on next steps (like the BFD prioritization, etc).
I didn't want to let this sit for days with no activity. If an issue comes up with the attempt, I can post back here or just start another thread (and reference this one).
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide