SPAN port out-discard counter

rttsui · ‎07-05-2002

We found a significant volume of out-discards for the SPAN ports of some of our 4230 sensors. The ports are 100 Mbps full duplex.

Does it mean that the sensor cannot keep up with the traffic, and is there anything we should do to fix it? I did not find any warnings in the errors.* files.

Following are the settings for fragment reassembly, is any adjustment required?

ReassembleMaxDgrams 2000

ReassembleMaxFragmentsPerDgram 10000

ReassembleFragmentTimeout 30

ReassembleMaxTotalFragments 100000

Thanks, and hope you can help.

marcabal · ‎07-08-2002

IS the "out-discards" being reported on the switch itself.

Or are you talking about the 993 alarms on the IDS sensor?

--------------------------------------------------------------------------------------

If you are talking about "out-discards" on the switch itself, then it is not an issue of the sensor's performance, but rather the rated capacity of the NIC.

If you try spanning 102 Mbps to the 100 Mbps NIC, then the switch will discard at least 2 Mbps.

Even if you are not spanning over 100 Mbps, it is still possible that the switch will drop the packets.

What can happen is that you may receive packets from 10 different ports that are all being spanned to the sensor. The first packet will be sent to the sensor over it's 100 Mbps link, but the other nine will be bufffered. If the buffer isn't large enough to hold the other nine then any that don't fit will be dropped by the switch.

Even if the buffer is bug enough for those nine, then the first of those nine in the buffer will be sent to the sensor, leaving 8 in the buffer. But if 5 more packets come in while that one is being sent, then those additional packets could fill the buffer and some of them may be dropped. This is what I usually call short burst traffic. It winds up being less than 100 Mbps total, but because of the short bursts you wind up with packets being dropped.

We have seen this situation in the lab where packets are spanned from Gig ports to the 100 Mbps port, even when the Gig port had less than 100Mbps on it.

We have also seen it when spanning multiple 100 Mbps ports to a single 100 Mbps even when the aggregate traffic was less than 100 Mbps.

It is an issue with the buffer size on the port within the switch, and the speed of the interface.

With the release of the IDS-4235, it is a 200 Mbps performance sensor, but technically has a 1Gpbs copper NIC. The 1Gpbs NIC can help prevent this issue.

---------------------------------------------------------------------------------------------------------------

If, however, you are talking about a specific alarm on the sensor, or data from the sensor, then please provide an example of the alarm or data. Then we can describe what may be ocurring.

rttsui · ‎07-12-2002

Thanks for the very clear explaination!

It was the 'out-discards' reported on the switch that I had in mind. We are checking to see whether the buffer size of the SPAN ports involved could be increased. We would look into the 4235 sensor as you suggested.

Would you please also elaborate on signature 993 and nrget statistics.

One of the sensors which has 'out-discards' numbers also shows packets dropped (from nrget and signature 993). Does it indicate that this particular sensor is not keeping up with the packets received? Is there any additional actions you would recommend? Thanks.

Following is some data for your reference:

From Signature 993 (GMT time)

--------------------------------------------

2002/07/12.13:28:35 Dropped 3 percent this interval: 7218 out of 21501

2002/07/12.13:30:27 Dropped 2 percent this interval: 4888 out of 20867

2002/07/12.13:35:47 Dropped 3 percent this interval: 7922 out of 22022

2002/07/12.13:36:35 Dropped 1 percent this interval: 3915 out of 21018

2002/07/12.13:37:23 Dropped 2 percent this interval: 4269 out of 20250

2002/07/12.13:44:35 Dropped 1 percent this interval: 2132 out of 20296

From nrget command

------------------------------

*** Fri Jul 12 09:49:56 EDT 2002

Statistics from: 07/11/2002 17:23:07

Number of seconds: 59209

IP Packets: 216959239

Filtered Packets: 0

ICMP Packets: 163925

TCP Packets: 215361448

UDP Packets: 1425766

Other Packets: 7896

Bad IP Packets: 0

Bad ICMP Packets: 0

Bad TCP Packets: 2783

Bad UDP Packets: 0

Signature Objects: 103307 -- Deleted: 36209195

Number Of Src Objects: 9083

Number Of Dst Objects: 9030

Number Of Dual Objects: 12401

Number Of Quad Objects: 1641

Storage Objects -- Current: 32155 Of Max: 175000

Number Of TCP Streams: 6135

DLPI drop: 68624 (0.3%)

marcabal · ‎07-12-2002

The 993s do indicate that you are dropping packets.

With the 993 alarm, the packets are being forwarded from the switch to the sensor, but the sensor is unable to analyze them.

NOTE: The DLPI drop counter at the bottom of the StatisticsOfIp counts these dropped packets. These are packets that the NIC saw, but the driver was unable to pass to nr.packetd because nr.packetd didn't clear out it's buffers fast enough.

The 993 alarm code looks at how much this counter changes between the start and stop of an interval. The 993 alarm code then calculates the percentage of drops against actual packets seen. If the percent is higher than 1 then it fires the 993 alarm.

NOTE2: The packets dropped by the switch itself are not calculated in the DLPI drop counter since they never make it to the sensor.

A drop of only 1 to 2 percent every now and then are probably nothing to worry about. The sensor is being pushed to it's maximum processing potential at those times.

A concistent drop of 1 to 2 percent may need to be looked into.

Drops of 5 percent or more shoul be looked into.

Things that could cause the sensor to drop packets:

1) IP Logging - IP Logging can consume memory and cpu as the packets have to be written to the harddrive. For this reason IP Logging may need to be minimized on sensors that are dropping packets.

2) Large Number of Log Files - Sapd is the process that maintains the log files. If there are a large number of log files in the var/dump directory, then it can consume cpu as sapd tries to count and determine the disk space utilitzed by the log files. Manually deleting these old log files yourself, and configuring sapd to delete the log files sooner (like at 50% disk utilization instead of 90%) coudl help prevent sapd from consuming the cpu.

3) Large alarm rates - If you are logging level 1 alarms on the sensor, and several thousand are firing every minute then this can consume cpu. you may want to limit the types of low level 1 alarms that get logged on the sensor.

4) Certain alarms are also highly cpu intensive - Some of the alarms are highly cpu intensive to analyze. If those alarms are rarely seen or are for vulnerabilities for services that are rarely if ever run, then those alarms get disabled by default. You may want to compare your signature settings to the default signature settings in the etc/wgc/templates/packetd.conf file to see if you have any of these turned on unnecessarily.

5) Some types of custom signatures can be cpu intensive - Certain custom signatures can be cpu intensive. Adding custom web signatures to the TCP.STRING engine instead of the HTTP.STATE engine can consume cpu. HTTP.STATE was optimized for web traffic. So if you have custom signatures then try disabling them to see if that helps.

6) Using Custom String Matches also consumes cpu - Consider changing the Custom String Matches into Custom SIgnatures using the TCP.STRING engine.

The TCP.STRING engine has the Custom String Match functionality plus much more, as well as performance enhancements not available in Custom String Matches.

7) Monitoring near 100Mbps. Any time the sensor nears it's upper performance threshold then it may drop a few percent of packets every now and then.

8) Monitoring incomplete tcp connections - If packets are being dropped by the switch itself then it could be that the sensor is monitoring incomplete tcp sessions. Incomplete session are harder to monitor because they remain open and eventually have time out since the sensor doesn't see the end of the connection. This results in extra memory and processing on these incomplete sessions while the sensor is waiting for them to time out.

These are just a few things off the top of my head that might possibly be slowing the sensor's performance down.

One, two, or maybe none of them may be issues for you.

If you consistently drop packets at the sensor, and can't reduce or eliminate the drops based upon my discussion above then you may want to consider upgrading to the IDS-4235 or IDS-4250.

rttsui · ‎07-12-2002

I will work on the points you mentioned. Thanks very much for your suggestions!