Solved: Re: Major ASA 5510 Issue / Input errors / Overruns

RICK MANCINELLI · ‎08-25-2011

Hello-

We have an ASA 5510 that has been in production for some time now and all has been well. Traffic on it has been increasing over time, but nothing outrageous. Two days ago we began taking MAJOR input errors (every single one is an overrun) on our inside interface. The errors come in LARGE lumps - 100k, 200k, 300k at a time. I have attached a summary of timestamps and input error counts to demonstrate what I am talking about.

"sh blocks" looks very good:

SIZE MAX LOW CNT

0 400 399 400

4 200 199 199

80 725 702 725

256 2412 2374 2411

1550 2932 2635 2673

2048 600 567 600

2560 900 899 900

4096 100 100 100

8192 100 100 100

16384 102 102 102

65536 16 16 16

"sh traffic" looks fine as well:

inside:

received (in 683.730 secs):

87210 packets 33517539 bytes

127 pkts/sec 49021 bytes/sec

transmitted (in 683.730 secs):

1979502 packets 243386175 bytes

2895 pkts/sec 355968 bytes/sec

1 minute input rate 138 pkts/sec, 101261 bytes/sec

1 minute output rate 2449 pkts/sec, 556063 bytes/sec

1 minute drop rate, 0 pkts/sec

5 minute input rate 127 pkts/sec, 64917 bytes/sec

5 minute output rate 1874 pkts/sec, 335035 bytes/sec

5 minute drop rate, 0 pkts/sec

"sh cpu" eliminates CPU hog as a potential issue:

CPU utilization for 5 seconds = 4%; 1 minute: 6%; 5 minutes: 6%

I cannot figure out how an interface that is moving only about 3000pkts/s can suddenly take 100,000+ input errors in a 1 second period?

So far we have:

- replaced the cable (three times)

- moved switch ports

- moved connection to another physical switch

- upgraded to ASA 8.2.5

Any thoughts?

Rick

varrao · ‎08-25-2011

Hi Rick,

Definitely we need to verify what traffic is this, whether it is normal network traffic or some malicious broadcast packet from any rogue machine. Captures would be the right option, along with the logs on the ASA, I guess we should follow that. Check whether it is any broadcast packets.

Thanks,

Varun

Thanks,
Varun Rao

View solution in original post

varrao · ‎08-25-2011

Hi Rick,

Can you provide the output of:

clear interface

and 3 outputs of show interface taken at an interval of one minute

clear asp drop

and

then show asp drop (3 outputs)

What effects did you notice due to these error on your network.

Thanks,

Varun

Thanks,
Varun Rao

RICK MANCINELLI · ‎08-25-2011

Ok, so I ran the clear int and then did show int three times at one minute intervals. I followed that with a clear asp drop followed by show asp drop at three minute intervals. I finished with two more show int.

The impact this issue is having on our network is horrible. Internet connectivity is horrific, and remote desktop usage is all but impossible.

I am noticing the 20k pps output rate from time to time. This is somewhat concerning because we do not have anything that should be generating that level of traffic. (Still, the ASA is supposedly rated for 190k pps, so this shouldn't be an issue). Further, the outside interface does not report anything near that level of traffic, which is even more puzzling. That interface (inside) does have several sub-interfaces each on their own VLAN.

(will post output in next message, getting error about "message cannot be displayed due to its content")

RICK MANCINELLI · ‎08-25-2011

Attached is the output. I could not paste it into a message for some reason?

varrao · ‎08-25-2011

Hi Rick,

These stats are very huge, I would recommend you to check the amount of connections being built on the firewall, the overruns would only occur if the traffic hitting the firewall is far greter than the speed with which the ASA can process those packets.

What does the show conn and show conn count tell, are these numbers also very high???

What we need to identify is what traffic is this which is getting dropped by the firewall. There is also one new feature which was introduced in version 8.2.5, whihc is flow control, by default is disabled, it was introduced to better the performance handling in case of high traffic. Flow control is the process of managing the pacing of data transmission between two nodes to prevent a fast sender from outrunning a slow receiver, on the ASA we can try enabling the same feature. To enable this feature on ASA, here is the link to it:

http://www.cisco.com/en/US/docs/security/asa/asa82/configuration/guide/intrface.html

I am not sure whether this would alleviate the issue completely but overruns only encountered if the ASA is overwhelmed by the incoming traffic, so it fails to process those packets and report overruns on the interface.

Thanks,

Varun

Thanks,
Varun Rao

RICK MANCINELLI · ‎08-25-2011

Varun-

I will take a look at the link. Also, check out the attached image! It seems that things chug along nicely and then, suddenly, there is a HUGE traffic spike on the "inside" interface. When I say huge, I mean we go from an average of less than 250pps to over 11k pps and then immediately back down to normal. At the same time, this is when the input/overrun errors are logged. I have the graphs side by side.

So now the question becomes... what can possibly be generating this traffic. It is "outbound" from the inside interface. There is NO corresponding traffic on the outside interface in either direction. If the traffic didn't come in to the outside interface and it didn't come in to the inside interface, then how the heck is it being sent OUT of the inside interface. IE where is it coming from?

Perhaps I need to setup some packet caps and see if I can figure it out. Any other ideas? Is it possible that this is a failing NIC on the ASA?

Rick

RICK MANCINELLI · ‎08-25-2011

Just a quick follow up. As we continue to watch, we have seen two spikes above 25k pps, and one as high as 60k pps. Again, no corresponding traffic on the outside interface.

This is all OUTBOUND from the inside interface.

It is almost as if the traffic is originating from the ASA itself!

Rick

varrao · ‎08-25-2011

Hi Rick,

Definitely we need to verify what traffic is this, whether it is normal network traffic or some malicious broadcast packet from any rogue machine. Captures would be the right option, along with the logs on the ASA, I guess we should follow that. Check whether it is any broadcast packets.

Thanks,

Varun

Thanks,
Varun Rao

manish arora · ‎08-25-2011

Hi Rick,

As Varun said you will need to see what is generating that amount of traffic in the inside of your network using logs and captures , also you can enable " ip verify reverse-path" on the asa so that it drops traffic that is being generated from inside network with bogus Source IP's.

Also, you should verify the speed/duplex setting on the interface on the asa and device connecting to it on the inside interface. I can see in your output , that you have speed/duplex hard coded on the asa.

Manish

hobbe · ‎08-25-2011

Hi

May I suggest that you setup a sniffer and mirror the port to the asa and if you have the hardware also any port in the network. wireshark is a free, well working standard sniffer.

I am just concerned that it might be that you get a network loop from time to time somewhere and that that is causing the traffic to spike like that.

Good luck

HTH

RICK MANCINELLI · ‎08-25-2011

Varun, others-

First I wanted to extend my sincere gratitude for your help in getting to the bottom of this issue. Capturing the packet data, I was able to identify THREE servers which were inexplicably sending out occasional, but very large, broadcast storms. The broadcasts were all Windows Browser Host Announcements. They were sent at 4 min, 8 min, and 12 min after bootup of the device and then again at 12 minute intervals. This is by design. However, instead of sending a single Host Announcement as they are supposed to, they were each sending some random number of packets in excess of 12,000. At times they would send as many as 60,000 packets! I suppose they REALLY wanted to make their presence known. LOL

The only thing in common with the three servers is that they are all Windows 2003 based. All have been up and in production for many years. Two were physical, one was virtual (a P2V). They all seemed to get sick at precisely the same time, which to me anyway, would indicate some sort of bug.

The ultimate fix was to simply disable NetBIOS on those boxes. Since then, no more traffic spikes, no errors on the ASA either.

I think the part that really confused me, and perhaps someone can shed some light on this, is that the ASA only showed outbound traffic on the inside interface. Why not inbound traffic, since the packets were broadcasts originating from somewhere else and destined for the inside network's broadcast address? Had the ASA shown these as "inbound" packets, I would have never suspected the ASA in the first place. I am clearly misunderstanding something...so always willing to learn something new if someone cares to explain it to me!

Thanks again and kudos to all for the help!

Rick

varrao · ‎08-26-2011

Hi Rick,

Thats really awesome you were able to nail down the issue. Those graphs with an exact periodic spike looked suspicious to me, so yes, capturing the traffic was some good job done by you

Coming back to your confusion, if those servers are located on the internal lan, then this traffic would definitely be outbound for the interface (leaving the inside interface), so you might be seeing it as outbound traffic. Or may be I could be wrong, because I am not really sure to which data are you pointing to? So can you shed some more light on it, with the help of the data?

Thanks,

Varun

Thanks,
Varun Rao