11-04-2024 03:54 AM - edited 11-04-2024 03:55 AM
Hi to all,
i am posting this in order to have your opinion about it.
Today our users behind the FTD faced timeouts as well as high RTTs.
Digging a little bit i noticed that CPU core 16 (not the other cores) was continuously steady at 100%.
After disabling the IPS policy for the outgoing traffic the timeouts stopped and the RTTs returned to normal.
So i decided to keep the IPS process only for the incoming traffic.
How could i identify the offending host or hosts ? In addition is there any possibility for this to happen due to elephant flows passing through the firewall or probably a huge backup from inside to the Internet?
Any views/opinions are most welcome.
Thanks
Ditter.
11-04-2024 04:03 AM
What FTD hardware are you running and what version software is installed on the FTD?
I have seen high latency being caused by Elephant flows and enabling Elephant flow remediation or sending that traffic outside of the IPS solves the issue. Do you have Elephant flow detection enabled? If yes you can search the "Analysis" logs for Elephant flows and see which source IPs were causing it.
But seeing a single core at 100% is normal at times. The core number will change from time to time also. When CPU becomes a problem is when several or all CPU cores are at 100%.
11-04-2024 05:19 AM
Hi Marius and @MHM Cisco World
thank you for your response.
i am running 7.2.8 on the FTD cluster
> show version
---------------------[ ftd-1 ]----------------------
Model : Cisco Firepower 2140 Threat Defense (77) Version 7.2.8 (Build 25)
UUID : 5857ad62-0bf5-11ed-b5a5-a5352e00b8f4
LSP version : lsp-rel-20241030-1856
VDB version : 397
and i am running 7.4.2 on the FMC.
> show version
----------------------[ fmc ]-----------------------
Model : Secure Firewall Management Center for VMware (66) Version 7.4.2 (Build 172)
UUID : 0be5b5be-bc49-11ed-8b60-038ff8fad965
Rules update version : 2024-10-30-001-vrt
LSP version : lsp-rel-20241030-1856
VDB version : 397
What i noticed is that although i had activated elephant flows i hadn't enable the bypass from within the same menu.
However IAB was active.
But i can not find any elephant flows from the analysis menu (although i filter with the field "Reason" for Elephant Flows.
Please see the attached PNGs.
Thanks,
Ditter.
Please see attached PNGs.
11-04-2024 05:59 AM
When I had a TAC case on this they said that this was remediated in version 7.2.5, that being said it is quite possible that the issue was not actually solved or was re-introduced.
The thing with the FTD2000 series is that although you can enable Elephant flow detection, there is no remediation even if you enable it. For remediation you would need to exchange the FTD2140 with either FTD1000, FTD3000 or FTD4100.
A suggestion from me would be to upgrade to the latest star version which is 7.4.2.1. This will no doubt be a suggestion from TAC should you open a case with them.
11-04-2024 04:10 AM
show asp inspect-dp snort
11-04-2024 05:23 AM
Hi MHM,
on my primary FTD:
> show asp inspect-dp snort
SNORT Inspect Instance Status Info
Id Pid Conns Segs/Pkts Status
-- ----- ---------- ---------- ----------
0 32159 1.9 K 0 READY
1 32162 1.9 K 0 READY
2 32166 2 K 0 READY
3 32183 2 K 0 READY
4 32164 1.9 K 0 READY
5 32165 1.9 K 0 READY
6 32185 2 K 0 READY
7 32138 2 K 0 READY
8 32186 2 K 0 READY
9 32156 1.9 K 0 READY
10 32140 1.9 K 0 READY
11 32158 2 K 1 READY
12 32187 2 K 0 READY
-- ----- ---------- ---------- ----------
Summary 25.4 K 1
11-04-2024 05:59 AM
It would be interesting to see what type of traffic are you currently inspecting under that IPS policy. Even though Elephant Flows event did not trigger, maybe the massive amount of inspected traffic could have something to do with it. What are your top applications that are hitting that IPS Rule at this moment?
11-04-2024 06:16 AM
When the problem first occured i did inspect all outgoing traffic from around 1000 PCs (that is traffic going to Internet) , after the problem occured i stopped inspecting this traffic and the problem stopped, so currently i do not inspect outgoing traffic , only incoming traffic to specific ports.
11-04-2024 07:55 AM
OK so you don't see this issue anymore. I guess one possible solution is to see if your connection events during the time of the high cpu usage is still available and try to create a report of the top Application Protocols that hit that IPS rule and see if this would provide any hints on where most of the inspection time went though it.
In general, application protocols that are not encrypted do get the most inspection and based on those you can get some guidelines on what could have been the reason for the high cpu usage.
11-04-2024 10:37 AM
Hi Ckleopa, i limited down the observation window during the high cpu load period and by using the predefined searches i searched for elephant flows but i did not find anything. So i assume something else kept the cpu load to 100% (CPU core num. 16 in particular). However don't know how to search further for the reason of high cpu util. which actually affected all users with dropped packets and high RTT. Thanks for your help.
11-04-2024 09:10 AM
When it happened again check snort cpu
For snort not lina cpu health use
Root@firepower:/opt/cisco/csp/application# top
MHM
11-04-2024 10:42 AM
Currently , with no intense network traffic the snort process keeps cpu load @ 48% and as mentioned it inspects traffic only in the incoming direction and only in high tcp/udp ports.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28736 root 1 -19 17.4g 8.1g 3.0g S 48.7 12.9 326:50.34 snort3
11-07-2024 05:22 AM - edited 11-07-2024 06:22 AM
Hi to all,
The problems continued with high unresponsiveness and with No active IPS rule and in addition i also enabled the NO RULE ACTIVE in the IPS policy.
The users still complaining for very slow network response even though i checked the CPU and int was not high at all.
The message i got (i am not sure when i got this message) , i mean before or after activating the no rules active in the IPS Policy ) and it was the following:
Module: Automatic Application Bypass Status
Description: [12132] Process '/ngfw/var/sf/detection_engines/f08edaa6-0bf5-11ed-9aa5-95282f00b8f4/snort3 --plugin-path /ngfw/var/sf/detection_engines/f08edaa6-0bf5-11ed-9aa5-95282f00b8f4/plugins:/ngfw/var/sf/lsp/active-so_rules --daq-dir /ngfw/usr/local/sf/lib/daq3 -M -Q -v -c /ngfw/var/sf/detection_engines/f08edaa6-0bf5-11ed-9aa5-95282f00b8f4/snort3.lua -l /ngfw/var/sf/detection_engines/f08edaa6-0bf5-11ed-9aa5-95282f00b8f4 --id-offset 1 --id-subdir --id-zero --run-prefix instance- --control-socket /ngfw/var/sf/detection_engines/f08edaa6-0bf5-11ed-9aa5-95282f00b8f4/snort3.sock --create-pidfile -s 1500 -z 13 ' bypassed.
Dont seem to be very clear message to me , as IPS policy was not active in any rule and what i did in order to make things work again , was to remove from the FTD a vlan consisting of many users. This brought things back to normal and traffic started to flow again.
Looking at connection events and unified events i could not find the offending host (or hosts) .
How could i troubleshoot this situation and have insight before just removing a vlan because simply it consisted of many users (and apparently my assumption was correct , but just an assumption
Thanks
Ditter.
11-07-2024 08:47 AM
as you share above the Top was Snort
so it snort issue
try reduce the snort level
11-08-2024 12:24 AM
Thanks @MHM Cisco World and @ckleopa Previously i had enabled the balanced mode , now according to your suggestions i activated the Connectivity over Security (now only 584 rules active). In addition i upgraded the FTDs to version 7.4.2.1-30 as well as the FMC to version 7.6.0-113.
I will also try the Cisco Recommended Rules but i noticed that it asked me on what ipv4/ipv6 networks i want the ips rules active. But i have already configured the appropriate ipv4/ipv6 networks that i have activated the IPS policy, i do not understand why the system asks me this question about the networks i want the IPS policy active.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide