Solved: Re: [ASR9000] NETFLOW-6-INFO_CACHE_SIZE_EXCEEDED

ปลาวาฬทราย RMUTT CPE IX · ‎04-13-2020

Hi,

I have seen this log every 30 minutes. Could you please provide the EEM script to verify if the cache is exceeded or not when the logging time?

Thank you very much.

ปลาวาฬทราย RMUTT CPE IX · ‎04-23-2020

I fixed by change from:

sampler-map FLOW_SAMPLER
 random 1 out-of 100
!

to

sampler-map FLOW_SAMPLER
 random 1 out-of 1000
!

Thank you very much.

View solution in original post

smilstea · ‎04-13-2020

Exceeding the cache size means you need to change your timers or increase your cache size.

1.2.2.    Flow Export Timers and events

Once extracted, this flow data resides in a cache in memory and is viewable through router CLI commands.  In order to be combined with NetFlow data from all other routers, however, the flow records must be exported to a NetFlow collector.  This export occurs when data in the cache is recycled.  Recycling can occur in one of three ways:

1)    The cache entry expires due to not matching incoming traffic for a specified amount of time – the inactive timer.

2)    The cache entry, though still matching incoming packets, has been in the cache so long that it exceeds another time limit – the active timer.

3)    The cache becomes full, so some of the oldest entries are purged to make room for new entries. The NetFlow cache has a predefined size depending on platform and/or amount of memory available.

4)    When flows terminate, e.g., for TCP flows when a FIN or RST has been received.

https://community.cisco.com/t5/service-providers-documents/asr9k-netflow-white-paper/ta-p/3145878

Sam

ปลาวาฬทราย RMUTT CPE IX · ‎04-13-2020

Do you have any recommendation which should I change between timers and the cache size? And to which value? How do I know?

Thank you very much.

smilstea · ‎04-14-2020

30 or 60 for the timers is usually aggressive enough. For cache you can use a large value just note it will use more RAM on the device.

Its really trial and error, try some values and see if the syslogs go away but also monitor memory utilization and cpu utilization.

Sam

AARON WEINTRAUB · ‎04-22-2020

So the issue is you have flows being ADDED, flows being KEPT, and flows ENDED.

To see the flow cache size, do show flow monitor <flow monitor map name> cache summ loc 0/X/cpu0

It may be possible that you may just need to increase the size of the cache. That is done in "flow monitor-map <name> cache entries <number>". That will allow you to store more entries before the NF cache fills up. One issue you MAY run into is that flows are being added FASTER than they are being able to be sent out. A flow can be "marked eligible" for export if it is inactive longer than the inactive timer OR if it is longer than the active timer no matter what. Keep in mind if the flow is still going on after the active timer finishes it, it will just start again in the new cycle. So lets say at t=0 you have 3000 flows added and they go on for continuous 2 minutes and you have the active flow timeout sent to 60. Since they're active, at t=60s the flow monitor will terminate them and then mark them finished. However by default there is a maximum 2000 fps rate-limit for NF entries EACH linecard will send out to the flow collector. Depending on your collector you might also run into weird problems like NF entries showing up 30s late - but that is a more complicated issue to deal with.

In the end, simply increasing your bucket size (cache) will probably help, BUT if the entries are being added into the bucket faster than they are being taken out (cache timeout rate-limit) then making it gigantic won't help

ปลาวาฬทราย RMUTT CPE IX · ‎04-23-2020

I fixed by change from:

sampler-map FLOW_SAMPLER
 random 1 out-of 100
!

to

sampler-map FLOW_SAMPLER
 random 1 out-of 1000
!

Thank you very much.

AARON WEINTRAUB · ‎04-23-2020

That is also a legitimate solution, but you don't necessarily want to undersample if you have very low traffic volume. But in in 1000 is probably a good number for most networks.