cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1060
Views
10
Helpful
3
Replies

Netflow caused bundle interface flap

Evan Roggenkamp
Level 1
Level 1

Good afternoon

Today I attempted to configure the following on an ASR 9010 in our network: 

- snmp-server ifindex persist
- snmp-server mibs cbqosmib persist
- flow exporter-map SOLARWINDS-IPV4
- transport udp 2055
- source Loopback0
- destination 1.1.1.1
- !
- flow monitor-map PRIMARY-IPV4
- record ipv4
- exporter SOLARWINDS-IPV4
- cache entries 100000
- cache timeout active 1
- cache timeout inactive 10
- !
- sampler-map ONE_PER_ONE
- random 1 out-of 1
- !

I then added the flow monitor to one of our bundle interfaces and this caused the bundle interface to flap. 

LC/0/6/CPU0:Jul 14 12:44:48.419 CDT: bfd_agent[126]: %L2-BFD-6-SESSION_STATE_DOWN : BFD session to neighbor 10.164.254.41 on interface TenGigE0/6/0/5 has gone down. Reason: Control timer expired 
RP/0/RSP1/CPU0:Jul 14 12:44:48.427 CDT: BM-DISTRIB[1148]: %L2-BM-5-MBR_BFD_SESSION_DOWN : The BFD session on link TenGigE0/6/0/5 in Bundle-Ether10 has timed out. The member will be removed from the active members of the bundle.
RP/0/RSP1/CPU0:Jul 14 12:44:48.427 CDT: BM-DISTRIB[1148]: %L2-BM-6-ACTIVE : TenGigE0/6/0/5 is no longer Active as part of Bundle-Ether10 (BFD state of this link is Down)
LC/0/5/CPU0:Jul 14 12:45:03.170 CDT: nfsvr[273]: %MGBL-NETFLOW-6-INFO_HIGHWATER : Cache for monitor PRIMARY-IPV4 entries 98777 exceeds highwater mark of 95000 entries

I am trying to understand why this happened. This is one of many ASR9010 that I have running in my environment which have all received the configuration that I have posted before, many of which also contained bundle interfaces. 

Why was there a problem on this router only?

3 Replies 3

Aleksandar Vidakovic
Cisco Employee
Cisco Employee

hi Evan,

The "1 out-of 1" is very aggressive for sampling. Do you really need each and every packet to be recorded by netflow? There is a policer on the netflow punt path to protect the CPU from overload, but it is possible that excessive netlow punts have caused delays in BFD processing.

You seem to have BoB on this router. What are the BFD interval and multiplier?

regards,

/Aleksandar

I am not sure - some netflow systems have reported inaccuracy of results for their purposes without it being that tight. Due to the instability we experienced, I think I will be trying to increase that rate to 1:100.

Here is the BFD config from that interface that was flapping:

bfd address-family ipv4 multiplier 3
bfd address-family ipv4 destination 10.164.254.41
bfd address-family ipv4 fast-detect
bfd address-family ipv4 minimum-interval 15

In BFD Over Bundle (BoB), we don't recommend configuring the minimum-interval less than 50ms (see http://www.cisco.com/c/en/us/td/docs/routers/asr9000/software/asr9k_r5-3/routing/configuration/guide/b_routing_cg53xasr9k/b_routing_cg53xasr9k_chapter_0100.html#ID485__IDTBL509).

This is a consequence of a special way that BoB works. The async and echo operations are explained in this doc: https://supportforums.cisco.com/document/144626/bfd-support-cisco-asr9000#-bfd-over-bundle-bob-feature-operation.

We're working on a BoB HW Offload feature, through which we'll be able to decrease that interval significantly.

I suppose the best advice I can give at this point is to increase the BoB minimum interval to 50ms. This should ensure that BoB stays up when you enable netflow.

hope this helps,

Aleksandar