01-19-2010 05:00 AM - edited 03-06-2019 09:21 AM
On last Friday we had a 10 minute outage because of the following errors;
2010-01-15 16:26:45 Local7.Warning 10.65.63.2 148586: Jan 15 16:28:55.663 EDT: %EARL-DFC2-4-NF_USAGE: Current Netflow Table Utilization is 74%
2010-01-15 16:26:55 Local7.Warning 10.65.63.2 148587: Jan 15 16:29:06.015 EDT: %EARL_NETFLOW-DFC2-4-TCAM_THRLD: Netflow TCAM threshold exceeded, TCAM Utilization [99%]
2010-01-15 16:27:01 Local7.Warning 10.65.63.3 180881: Jan 15 16:29:11.628 EDT: %EARL_NETFLOW-DFC2-4-TCAM_THRLD: Netflow TCAM threshold exceeded, TCAM Utilization [90%]
2010-01-15 16:27:07 Local7.Warning 10.65.63.3 180882: Jan 15 16:29:18.004 EDT: %EARL_NETFLOW-DFC1-4-TCAM_THRLD: Netflow TCAM threshold exceeded, TCAM Utilization [99%]
2010-01-15 16:27:09 Local7.Warning 10.65.63.2 148588: Jan 15 16:29:20.039 EDT: %EARL_NETFLOW-SP-4-TCAM_THRLD: Netflow TCAM threshold exceeded, TCAM Utilization [97%]
2010-01-15 16:27:15 Local7.Warning 10.65.63.2 148589: Jan 15 16:29:25.423 EDT: %EARL_NETFLOW-SPSTBY-4-TCAM_THRLD: Netflow TCAM threshold exceeded, TCAM Utilization [96%]
I looked at Cisco's web site for answers and I found one that said we need to disable service internal and change the flow aging to be more aggressive. Did anyone have this issue before? Does anyone know what would cause this kind of issues?
Thanks
Solved! Go to Solution.
01-19-2010 03:09 PM
Hello Zoltan,
if the objective of netflow collection is to gather for security purposes defining flows at Layer 4 including protocol and ports is highly desirable.
If so you can monitor current situation with the change on fast aging timers.
if the obejctive is just to be able to classify traffic to and from internet a destination-source mask can be enough.
I see Mazu has been bought by riverbed.
I agree that application performance analysis requires full flowmask at least.
Hope to help
Giuseppe
01-19-2010 05:43 AM
Hello Zoltan,
>> Does anyone know what would cause this kind of issues?
traffic variety can cause this. the Netflow TCAM table is of limited size so in an attempt to track multiple flows for netflow accounting purposes the table fills and there was an impact.
I see DFC2 in the messages, so my guess is that your C6500 uses a supervisor2, MSFC2, PFC2 combination.
the Netflow TCAM size for sup2/MSFC2 + PFC2 is reported here:
An explanation of MLS timers for netflow table is here:
To keep the NetFlow table size below the recommended utilization, enable the following parameters when using the mls aging command:
•normal—Configures an inactivity timer. If no packets are received on a flow within the duration of the timer, the flow entry is deleted from the table.
•fast aging—Configures an efficient process to age out entries created for flows that only switch a few packets, and then are never used again. The fast aging parameter uses the time keyword value to check if at least the threshold keyword value of packets have been switched for each flow. If a flow has not switched the threshold number of packets during the time interval, then the entry is aged out.
•long—Configures entries for deletion that have been active for the specified value even if the entry is still in use. Long aging is used to prevent counter wraparound, which can cause inaccurate statistics.
The suggestion for tuning is the following:
If you need to enable MLS fast aging time, initially set the value to 128 seconds. If the size of the NetFlow table continues to grow over the recommended utilization, decrease the setting until the table size stays below the recommended utilization. If the table continues to grow over the recommended utilization, decrease the normal MLS aging time.
You can only attempt to prevent the filling of the table and the price to pay is less accuracy in netflow accounting.
By using aggressive timers you can "miss" some flows because they last for a short time or they are not formed by enough packets in a given time window as explained above, so they are removed from the table and so they are not exported to netflow collector.
There is a feature called NDE flow filter but it influences only what flows are exported to NFC not what is in the netflow TCAM table.
see
It is a trade off between accuracy and scalability / stability of device.
Other users have reported similar issues,may be without an impact on traffic forwarding.
Where is placed the C6500 in a service provider POP, in an internet exchange point?
It is not traffic volume that counts but how many IP flows classified per your NDE mask are seen.
So another point to investigate is what flow mask you are using now, a more detailed definition of flows generated more entries in the table.
see
This is probably first aspect to check. As other users have reported there are cases where different flow masks would be required by different features.
Hope to help
Giuseppe
01-19-2010 05:59 AM
Thank you for the quick answer Giuseppe. I included the output from sh mod. As you can see we use SUP720s and the 6509s are in the core network with 4948s connected to it by 10G. Each 4948 connects to two 6509 for redundancy. The 4948s have the servers. I also saw a bunch of NetFlow creation failure on both switches. It was strange that it caused outage. We have Mazu devices to collect and analyse flows.
Mod Ports Card Type Model Serial No.
--- ----- -------------------------------------- ------------------ -----------
1 8 CEF720 8 port 10GE with DFC WS-X6708-10GE SAD112002N4
2 8 CEF720 8 port 10GE with DFC WS-X6708-10GE SAD111904F8
3 4 CEF720 4 port 10-Gigabit Ethernet WS-X6704-10GE SAL1109JABK
4 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1119NJ1P
5 2 Supervisor Engine 720 (Active) WS-SUP720-3B SAL1122Q527
6 2 Supervisor Engine 720 (Hot) WS-SUP720-3B SAL1122QEZX
7 48 CEF720 48 port 1000mb SFP WS-X6748-SFP SAL1122QCP4
8 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL09316RWY
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
1 001b.d483.5624 to 001b.d483.562b 1.3 12.2(18r)S1 12.2(33)SXI Ok
2 001b.539d.2820 to 001b.539d.2827 1.3 12.2(18r)S1 12.2(33)SXI Ok
3 001a.6cf5.bf54 to 001a.6cf5.bf57 2.5 12.2(14r)S5 12.2(33)SXI Ok
4 001b.d452.55f0 to 001b.d452.561f 2.5 12.2(14r)S5 12.2(33)SXI Ok
5 0016.c85e.ab24 to 0016.c85e.ab27 5.4 8.4(2) 12.2(33)SXI Ok
6 0017.9568.eb48 to 0017.9568.eb4b 5.4 8.4(2) 12.2(33)SXI Ok
7 001b.d45d.cb30 to 001b.d45d.cb5f 1.10 12.2(14r)S5 12.2(33)SXI Ok
8 0014.f212.3a58 to 0014.f212.3a87 2.3 12.2(14r)S5 12.2(33)SXI Ok
Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
1 Distributed Forwarding Card WS-F6700-DFC3C SAD112104MW 1.0 Ok
2 Distributed Forwarding Card WS-F6700-DFC3C SAD112104FB 1.0 Ok
3 Distributed Forwarding Card WS-F6700-DFC3B SAD111605VH 4.6 Ok
4 Distributed Forwarding Card WS-F6700-DFC3B SAD11160027 4.6 Ok
5 Policy Feature Card 3 WS-F6K-PFC3B SAL1122Q5TN 2.3 Ok
5 MSFC3 Daughterboard WS-SUP720 SAL1122Q3QC 3.0 Ok
6 Policy Feature Card 3 WS-F6K-PFC3B SAL1122Q9LX 2.3 Ok
6 MSFC3 Daughterboard WS-SUP720 SAL1123QJ6W 3.0 Ok
7 Distributed Forwarding Card WS-F6700-DFC3B SAL1110JMHG 4.6 Ok
8 Distributed Forwarding Card WS-F6700-DFC3A SAL08486L4F 2.2 Ok
Mod Online Diag Status
---- -------------------
1 Pass
2 Pass
3 Pass
4 Pass
5 Pass
6 Pass
7 Pass
8 Pass
01-19-2010 06:36 AM
Giuseppe,
Here is the configuration we have on the 6509s for NetFlow;
no mls acl tcam share-global
mls aging fast time 30 threshold 128
mls aging long 64
mls aging normal 32
mls netflow interface
mls flow ip interface-full
no mls flow ipv6
mls nde sender
ip flow-export source Vlan120
ip flow-export version 9
ip flow-export destination 10.65.63.151 4000
And under each VLAN interface;
Interface vlan90
ip flow ingress
The mls aging fast was added after the problem we had.
01-19-2010 07:25 AM
Hello Zoltan,
you have provided a lot of details.
You have PFC3B sup720 and this is good news.
Note that the use of the most detailed flowmask
mls flow ip interface-full
is going to create more entries in the table.
Le'ts make an example: if we define an IP flow only using IP SA and IP DA any conversation between two given hosts is classified as a single flow in the table.
If we define a flow using more details for example adding upper layer protocol and ports we have a line for telnet, one line for web access and so on.
so depending on what features are on the device a flow mask like
destination-source—A more-specific flow mask. The PFC maintains one entry for each source and destination IP address pair. Statistics for all flows between the same source IP address and destination IP address aggregate into this entry.
can be of help in containing size of netflow TCAM table.
you are currently using the most specific flowmask
full—A more-specific flow mask. The PFC creates and maintains a separate table entry for each IP flow. A full entry includes the source IP address, destination IP address, protocol, and protocol ports.
•full-interface—The most-specific flow mask. Adds the source VLAN SNMP ifIndex to the information in the full-flow mask.
Check table 50-1 and analyze configuration of your device. IF possible moving to flow mask destination-source can be of help.
or also to full instead of full-interface.
Hope to help
Giuseppe
01-19-2010 07:46 AM
Giuseppe,
The configuration we have was giving us by Mazu when we originally put the Mazu devices on the network. What would the drawback be if we would use something other then the interface-full? Would we have less information on traffic? we use Mazu to see the traffic flow and see if we have any issue on the network.
01-19-2010 03:09 PM
Hello Zoltan,
if the objective of netflow collection is to gather for security purposes defining flows at Layer 4 including protocol and ports is highly desirable.
If so you can monitor current situation with the change on fast aging timers.
if the obejctive is just to be able to classify traffic to and from internet a destination-source mask can be enough.
I see Mazu has been bought by riverbed.
I agree that application performance analysis requires full flowmask at least.
Hope to help
Giuseppe
09-08-2010 11:33 AM
Giuseppe,
Thank you for the detailed reply issue above- i'm facing exact same issue (but no traffic outage) and was looking for possible workarounds- so this helps alot.
Regards,
Titus
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide