Re: ASA5510 - Packet loss and underrun but low CPU

AK59 · ‎08-19-2021

Dear all,

I'm writing you regarding a big headache I have with a active/passive ASA 5510 cluster.

Both have been updated to their latest version ( 9.1.7 ).

Since 4/5 months now, we have complaints from users as their IP phone reboot nearly 5 to 6 times a day. This behaviour occurs when the IP phone can reach its server. The firewall is between the server and the IP phone. We also have complaints regarding random latency when devices on the same location of the IP phone try to reach their server ( on the same location of IP Phone server).

After some investigation, I observe a lot of cpu-hog on the dispatch unit and ovveruns.

This is the statistic from the interface on the server's side :

771705546 packets input, 169543247259 bytes, 0 no buffer
Received 8175 broadcasts, 0 runts, 0 giants
604 input errors, 0 CRC, 0 frame, 604 overrun, 0 ignored, 0 abort
0 pause input, 0 resume input
0 L2 decode drops
543216229 packets output, 132318002309 bytes, 0 underruns
0 pause output, 0 resume output
0 output errors, 0 collisions, 2 interface resets
0 late collisions, 0 deferred
0 input reset drops, 0 output reset drops, 0 tx hangs
input queue (blocks free curr/low): hardware (255/230)
output queue (blocks free curr/low): hardware (255/108)

This is the statistics from the interface on the devices side :

528796173 packets input, 128631811627 bytes, 0 no buffer
Received 24794 broadcasts, 0 runts, 0 giants
256 input errors, 0 CRC, 0 frame, 256 overrun, 0 ignored, 0 abort
0 pause input, 0 resume input
0 L2 decode drops
762513350 packets output, 168836484510 bytes, 5829 underruns
0 pause output, 0 resume output
0 output errors, 0 collisions, 2 interface resets
0 late collisions, 0 deferred
1 input reset drops, 0 output reset drops, 0 tx hangs
input queue (blocks free curr/low): hardware (255/230)
output queue (blocks free curr/low): hardware (255/0)

I don't have a big throughput ( averaging 30 Mbps overall) , the CPU is good ( 22% ) and Memory too ( 350 MB out of 1024 MB). I'm averaging 6 000 connection

The usage is good too I guess

Resource Current Peak Limit
SSH Server 1 1 5
ASDM 1 1 30
Syslogs [rate] 288 3847 N/A
Conns 5729 11205 130000
Xlates 4 4 N/A
Hosts 4346 4370 N/A
Conns [rate] 281 1232 N/A
Inspects [rate] 317 919 N/A
Routes 36 36 unlimited

But I'm experiencing cpu-hog on dispatch unit

Process: Dispatch Unit, PROC_PC_TOTAL: 765852, MAXHOG: 65, LASTHOG: 3
LASTHOG At: 13:12:49 CEDT Aug 19 2021
PC: 0x082a4838 (suspend)

Process: Dispatch Unit, NUMHOG: 741108, MAXHOG: 65, LASTHOG: 3
LASTHOG At: 13:12:49 CEDT Aug 19 2021
PC: 0x082a4838 (suspend)
Call stack: 0x082a4838 0x0806a65c

Process: Dispatch Unit, PROC_PC_TOTAL: 547272, MAXHOG: 52, LASTHOG: 4
LASTHOG At: 13:12:50 CEDT Aug 19 2021
PC: 0x082a4a8c (suspend)

Process: Dispatch Unit, NUMHOG: 544708, MAXHOG: 52, LASTHOG: 4
LASTHOG At: 13:12:50 CEDT Aug 19 2021
PC: 0x082a4a8c (suspend)
Call stack: 0x082a4a8c 0x0806a65c

I tried tio figure it out the ASP drop too

Frame drop:
Flow is denied by configured rule (acl-drop) 392040
First TCP packet not SYN (tcp-not-syn) 274395
TCP failed 3 way handshake (tcp-3whs-failed) 1502
TCP RST/FIN out of order (tcp-rstfin-ooo) 215421
TCP RST/SYN in window (tcp-rst-syn-in-win) 35
ICMP Error Inspect no existing conn (inspect-icmp-error-no-existing-conn) 4
Dropped pending packets in a closed socket (np-socket-closed) 54

Last clearing: 17:18:05 CEDT Aug 18 2021 by enable_15

Flow drop:
Inspection failure (inspect-fail) 3968

I just can't find where my problem is...an someone help me please ?

balaji.bandi · ‎08-19-2021

Do you have any high level network diagram, what you see the device connected to switch ?

is the Switch is ok ? any Logs ?

If you reboot the ASA does the problem resolves ?

here some troubleshoot :

https://www.cisco.com/c/en/us/support/docs/security/asa-5500-x-series-next-generation-firewalls/115985-asa-overrun-product-tech-note-00.html

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

AK59 · ‎08-19-2021

Well, I don't network diagram to show but it goes likes

Servers<=>Nexus<=>ASA<=>Router<=>Switches<=>Devices

The devices come from various locations and various switches.

The servers come from the same farm and the nexus interface statistics show no errors.

If there should be a problem, it would wether be the ASA or the Router.

The ASA has been reloaded more than once...

balaji.bandi · ‎08-19-2021

After ASA reboot did the fix the issue ? for some time ?

Router<=>Switches<=>Devices

we need to check Router and switch Logs ?

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

AK59 · ‎08-19-2021

It did for like 24hrs...I don't have control to any of the router or switch...I would like to focus on the ASA as I suggest the problem comes from there.

balaji.bandi · ‎08-19-2021

The input that not helps us to suggest, until we see aorund device connected with more inputs, if not it is very hard to identify the issue ( i am afraid any help here).

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

AK59 · ‎08-23-2021

I just checked with the adjacent routeur ( brand : Hirschmann ), there is no error on its interfaces. Neither on the Nexus ones.

My two next possibilities are :

Enable flow control My question is : Is it compatible with a non-cisco device ? Do I have to change some parameters on the adjacent device
Switch the secondary device, is it necessary or dooes the problem will remain in the cluster ?
Change port on the switch. Is it possible that a faulty NIC is the problem ? ( I have Underrun and overrun on both interface in and out )

Thanks in advance,

AK59 · ‎08-29-2021

Hi everyone,

Here is a little update of situation.

I didn't enable the flow control yet.

I did a failover to check if the problem will occur on the second device...and it does I still have underruns and overrun when I'm on the second device.

But now I'm really suspicous by the ASP DROP rate. I get a lot of "First TCP packet is not SYN" .

When I capture the traffic, the packets in error are from both sides... ( inside to outside and outside to inside)

johnlloyd_13 · ‎08-30-2021

hi,

this could be a HW oversubscription on your 5510.

was there any recent change in your network environment? i.e. additional VLAN/users or application eating up BW?

do you have port/traffic monitoring on the ASA? i.e. PRTG or solarwinds?

could you post the output of these commands:

show process cpu-usage sorted non-zero

show conn count

show local | in host|count/limit