08-19-2021 04:15 AM
Dear all,
I'm writing you regarding a big headache I have with a active/passive ASA 5510 cluster.
Both have been updated to their latest version ( 9.1.7 ).
Since 4/5 months now, we have complaints from users as their IP phone reboot nearly 5 to 6 times a day. This behaviour occurs when the IP phone can reach its server. The firewall is between the server and the IP phone. We also have complaints regarding random latency when devices on the same location of the IP phone try to reach their server ( on the same location of IP Phone server).
After some investigation, I observe a lot of cpu-hog on the dispatch unit and ovveruns.
This is the statistic from the interface on the server's side :
771705546 packets input, 169543247259 bytes, 0 no buffer
Received 8175 broadcasts, 0 runts, 0 giants
604 input errors, 0 CRC, 0 frame, 604 overrun, 0 ignored, 0 abort
0 pause input, 0 resume input
0 L2 decode drops
543216229 packets output, 132318002309 bytes, 0 underruns
0 pause output, 0 resume output
0 output errors, 0 collisions, 2 interface resets
0 late collisions, 0 deferred
0 input reset drops, 0 output reset drops, 0 tx hangs
input queue (blocks free curr/low): hardware (255/230)
output queue (blocks free curr/low): hardware (255/108)
This is the statistics from the interface on the devices side :
528796173 packets input, 128631811627 bytes, 0 no buffer
Received 24794 broadcasts, 0 runts, 0 giants
256 input errors, 0 CRC, 0 frame, 256 overrun, 0 ignored, 0 abort
0 pause input, 0 resume input
0 L2 decode drops
762513350 packets output, 168836484510 bytes, 5829 underruns
0 pause output, 0 resume output
0 output errors, 0 collisions, 2 interface resets
0 late collisions, 0 deferred
1 input reset drops, 0 output reset drops, 0 tx hangs
input queue (blocks free curr/low): hardware (255/230)
output queue (blocks free curr/low): hardware (255/0)
I don't have a big throughput ( averaging 30 Mbps overall) , the CPU is good ( 22% ) and Memory too ( 350 MB out of 1024 MB). I'm averaging 6 000 connection
The usage is good too I guess
Resource Current Peak Limit
SSH Server 1 1 5
ASDM 1 1 30
Syslogs [rate] 288 3847 N/A
Conns 5729 11205 130000
Xlates 4 4 N/A
Hosts 4346 4370 N/A
Conns [rate] 281 1232 N/A
Inspects [rate] 317 919 N/A
Routes 36 36 unlimited
But I'm experiencing cpu-hog on dispatch unit
Process: Dispatch Unit, PROC_PC_TOTAL: 765852, MAXHOG: 65, LASTHOG: 3
LASTHOG At: 13:12:49 CEDT Aug 19 2021
PC: 0x082a4838 (suspend)
Process: Dispatch Unit, NUMHOG: 741108, MAXHOG: 65, LASTHOG: 3
LASTHOG At: 13:12:49 CEDT Aug 19 2021
PC: 0x082a4838 (suspend)
Call stack: 0x082a4838 0x0806a65c
Process: Dispatch Unit, PROC_PC_TOTAL: 547272, MAXHOG: 52, LASTHOG: 4
LASTHOG At: 13:12:50 CEDT Aug 19 2021
PC: 0x082a4a8c (suspend)
Process: Dispatch Unit, NUMHOG: 544708, MAXHOG: 52, LASTHOG: 4
LASTHOG At: 13:12:50 CEDT Aug 19 2021
PC: 0x082a4a8c (suspend)
Call stack: 0x082a4a8c 0x0806a65c
I tried tio figure it out the ASP drop too
Frame drop:
Flow is denied by configured rule (acl-drop) 392040
First TCP packet not SYN (tcp-not-syn) 274395
TCP failed 3 way handshake (tcp-3whs-failed) 1502
TCP RST/FIN out of order (tcp-rstfin-ooo) 215421
TCP RST/SYN in window (tcp-rst-syn-in-win) 35
ICMP Error Inspect no existing conn (inspect-icmp-error-no-existing-conn) 4
Dropped pending packets in a closed socket (np-socket-closed) 54
Last clearing: 17:18:05 CEDT Aug 18 2021 by enable_15
Flow drop:
Inspection failure (inspect-fail) 3968
I just can't find where my problem is...an someone help me please ?
08-19-2021 04:32 AM
Do you have any high level network diagram, what you see the device connected to switch ?
is the Switch is ok ? any Logs ?
If you reboot the ASA does the problem resolves ?
here some troubleshoot :
08-19-2021 06:01 AM
Well, I don't network diagram to show but it goes likes
Servers<=>Nexus<=>ASA<=>Router<=>Switches<=>Devices
The devices come from various locations and various switches.
The servers come from the same farm and the nexus interface statistics show no errors.
If there should be a problem, it would wether be the ASA or the Router.
The ASA has been reloaded more than once...
08-19-2021 06:20 AM
After ASA reboot did the fix the issue ? for some time ?
Router<=>Switches<=>Devices
we need to check Router and switch Logs ?
08-19-2021 06:27 AM
It did for like 24hrs...I don't have control to any of the router or switch...I would like to focus on the ASA as I suggest the problem comes from there.
08-19-2021 06:50 AM
The input that not helps us to suggest, until we see aorund device connected with more inputs, if not it is very hard to identify the issue ( i am afraid any help here).
08-23-2021 07:12 AM
I just checked with the adjacent routeur ( brand : Hirschmann ), there is no error on its interfaces. Neither on the Nexus ones.
My two next possibilities are :
Thanks in advance,
08-29-2021 06:14 AM
Hi everyone,
Here is a little update of situation.
I didn't enable the flow control yet.
I did a failover to check if the problem will occur on the second device...and it does I still have underruns and overrun when I'm on the second device.
But now I'm really suspicous by the ASP DROP rate. I get a lot of "First TCP packet is not SYN" .
When I capture the traffic, the packets in error are from both sides... ( inside to outside and outside to inside)
08-30-2021 07:19 PM
hi,
this could be a HW oversubscription on your 5510.
was there any recent change in your network environment? i.e. additional VLAN/users or application eating up BW?
do you have port/traffic monitoring on the ASA? i.e. PRTG or solarwinds?
could you post the output of these commands:
show process cpu-usage sorted non-zero
show conn count
show local | in host|count/limit
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide