cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3548
Views
21
Helpful
10
Replies

Intelligent Application Bypass - is it working?

niko
Beginner
Beginner

Hi,

Fighting with elephant flows and at the moment I am losing the fight. I'm trying to set up the IAB feature to avoid inspecting long lasting/high bandwidth flows as they are spiking the CPU, but I cannot seem to get it working despite trying different settings. Documentation explains how does it work in general, but that does not help much, if I'm setting the obvious and not getting any hits.

For example, my expectations of settings (trying everything at the Test mode): 

Application: HTTP

State: Test

Sample Interval: 1 sec

Flow Velocity: 3000

.. will be that flows from HTTP application (web downloads, etc.) with speed >3 Mbps will be marked as bypassed in connection events. 

But tried different sample intervals, different packets per flow/bytes per flow - no real success though. 

Is there anyone around who got that working successfully, can share IAB config and the result of it?

Cheers!

P.S. Platform: ASA with FP module, sitting at 6.0.1.2

10 Replies 10

Gabriel Copil
Beginner
Beginner

Hi,

I'm currently testing IAB also (elephant flows being the reason), with 2 diferrent ASA5585-SSP60 with FP services v.6.1.0.3 (IAB in mode test), 

I'm generating elephant flows using FTP transfers that are passing through 2 or 3 FP, having the same Network Analysis Policy but different Access Control Policies and I can tell you that it looks like it might work - at least I've managed to get connection events with Reason = "Intelligent App Bypass"

On one FP I have configured:

  • Performance Sample Interval = 2 seconds
  • Bypassable Applications and Filters = All apps including unidentified apps (this options requires v.6.1.0.3)
  • Inspection Performance Thresholds - Drop Percentage=1
  • Flow Bypass Thresholds - Flow Velocity = 45000  kbytes/sec

On this FP, one connection event appears with Reason = "Intelligent App Bypass" (in the Analysis - Connections - Events), but the widget I've made (according to the IAB documentation) didn't show this, as the FTP-Data connections were not having the Application details detected (columns Application Protocol, Client, Application Risk, Business Relevance were empty). I will explain below why.

On the 2nd FP I have configured:

  • Performance Sample Interval = 2 seconds
  • Bypassable Applications and Filters = All apps including unidentified apps (this options requires v.6.1.0.3)
  • Inspection Performance Thresholds - Drop Percentage=2

On this FP, IAB didn't hit any of the elephant flows, although other sessions passing through this FP module experienced 14% dropped packets (ping from monitoring systems towards several destinations).

Using the Overview - Summary - Intrusion Event Performance / graph "Percent Packets Dropped" I could see that I had 0 on both FP modules, so threshold "Drop Percentage" was not the trigger, but only Flow Velocity could have detect these.

The FTP client shown at transfer summary an average speed of 45616.46Kbytes/sec, so the Flow Velocity threshold was triggered correctly.

You can check in the Connection Events (if you have log at the end of the connection activated in the ACR that matches that traffic) if the connections really qualify for the IAB settings:

- Application Protocol (in my case it was empty, so an IAB filter for specific applications will not qualify)

- Estimated flow speed: max(traffic quantity)/duration (max([Initiator Bytes],[Responder Bytes])/1024)/([Last Packet]-[First Packet])

I'm using the max(initiator or responder bytes) because in some cases (like FTP transfer) the elephant flow is shown with a different direction than you would think is normally. e.g:

  1. client (initiator), src.port:high-dynamic opens ftp connection to server (responder), dest.port:21
  2. when a file transfer is initiated by the client (put dummy_big_file.test), the ftp server is opening an ftp-data session: server (initiator), src.port:20 open ftp-data connection to client (responder), dest.port:high-dynamic

having this situation, the traffic of file upload from ftp client to server is in my case on the [Responder Bytes] column.

In my case, I had 2 FTP sessions from 2 different clients to the same server:

PC1 - FP3 - FP1 - FP2 - Server

PC2 - FP1 - FP2 - Server

For PC1, the numbers are: 

Responder Bytes: 3,412,996,403

First Packet: 2017-06-04 12:32:29

Last Packet: 2017-06-04 12:36:04

Estimated speed(using the above formula): 15,502kbytes/sec

This connection was not having the Reason "Intelligent App Bypass", as the speed is lower than the configured threshold on FP1. The flow was passing a 3rd FP module (ASA5545), so I can understand why it was slowdown compared with the flow of PC2

For PC2, the numbers are:

Responder Bytes: 3,405,015,564

First Packet: 2017-06-04 12:32:21

Last Packet: 2017-06-04 12:33:34

Estimated speed (using the above formula): 45,550kbytes/sec 

This connection event has the Reason "Intelligent App Bypass", as flow speed was higher than the configured 45,000kbytes/sec thresold in IAB of FP1 (based on calculation using FP data, and the avg.speed shown by FTP client).

For the same connection, the connection event generated by FP2 was having the same numbers, but this one has IAB with threshold only for Drop Percentage=2, which it looks it was not exceeded (in fact is shown as 0 in the Intrusion Event Performance screen)

Now, the explanation I've found (or I'm imagining) regarding the lack of information about Application Protocol, Risk, Business relevance, etc. is, for my tests, this one:

I have in my Access Control Policy from both FP1 and FP2, a trust rule for the traffic matching the Application Protocol=FTP, and the ftp session of the client to server (session from step 1. from above) is hitting this access control rule, the session being trusted. I believe that FP is not able to match the next session opened from the server to client (ftp-data) as being a resulted session of the previously trusted FTP control session, as it's not inspecting anymore the control session - to see what ftp server and client decided to do/transfer.

What I don't understand, is why this ftp-data session is not detected as application "FTP Data" (this would also be trusted by the ACR), but for the IAB functionality, an APP filter for FTP & FTP Data would not apply to these sessions, as they are not recognized. Maybe it's a bug, or a documented (mis)functionality that I didn't read it yet.

The above theory is based on the connection event logged by FP3 (unfortunately is version 6.1.0.1), where I could see the ftp data transfer having correctly recognized the Application Protocol, Client, App Risk & Business relevance, and on this FP3 module (apart from the different version), also the trust ACR for FTP control session is missing. I will upgrade this sensor in the next days, to see if the issue is from version, or is how FP is handling child sessions resulted from a previously trusted session (like the case of FTP/FTP-Data sessions)

Apart from the strange behavior of application detector for ftp-data, I have a question for which I couldn't find the answer in the IAB documentation: what happens after a flow is trusted, for how long the flow will continue to be trusted by FP?

I'm thinking of Microsoft SMB sessions that might transfer a large & big file at high speed (for which IAB would be triggered because of the negative effect on performance), but what will happen when the elephant flow is finished? The next file transfer/SMB command would be inspected by normal Access Control Rules, or it will continue to be trusted until....?

I hope these details will help others struggling with IAB & Elephant flows, and also I will get some clarification for my IAB vs. Elephants headache. :)

Oh, great information you got there! Will play around with the IAB more as we are still fighting CPU spikes from different sources. 

If we are talking about IAB - I'd even like to base that Trust decision on data amount withing the sample interval, basically instead of checking velocity every second/two seconds, just check if there is a session with 100+ MB in 5 seconds, for example. That would probably be less stressful for FP as it has to check less frequently for active sessions and their state. Flow velocity may be a bit misleading in my opinion - I have nothing against quick & fast flows. Problem usually starts if the flow has been going long & fast enough and/or there are other flows with potential elephant characteristics.  But, yes, will have to play around to see if I have more to add regarding IAB and it is a matter of a tuning then. Well, if I'll get that to work at all. :)

A few things to add from my side:

 FTP-Data is not detected correctly indeed - I have a bug filled regarding that, but it is not public: "CSCvb33410 file transfer not blocked using ftp-data application ID in ACP". If you need more information and description try to ask your Cisco contacts, but basically AVC didn't recognize FTP-Data it properly, so using FTP-Data in ACP weren't viable. Issue still seems to be present on 6.0.1.3. Dunno about 6.1.x. 

When flow/session is trusted - it is basically discarded from any FP interaction. FP is letting ASA handle that traffic further (so called fastpath) by using DAQ. So as soon as session is Trusted - it will remain trusted as long as the session is alive. If new session is created on the ASA, it is checked against ACP and action depends on rules configured.

We had a TAC session regarding Trust rules and monitor-only mode - so one more gotcha - Trust rules are not working properly with monitor-only mode (CSCvd93394). That means you cannot test them properly when in monitor-only mode, you have to put FP inline.

Hello Niko,

thank you very much for the information about FTP-Data bug. I'm looking forward for the version that will fix that (among other things).

Coming back to IAB, I discover why I had no connections events that IAB would bypass on the FP2:

I had configured threshold only for "Inspection Performance" category (Drop Percentage=2), and in the documentation (at least for v6.1.0.3) is stated:

"Inspection performance and flow bypass thresholds are disabled by default. You must enable at least one of each, and one of each must be exceeded for IAB to trust traffic."

That means, I had to configure also a threshold in the category "Flow Bypass" - and I choose to configure Flow Velocity. After that, I had connections events with Reason = "Intelligent App Bypass" on the FP2 module also. Initially I overlooked this detail from the documentation, and by chance I had configured on FP1 one threshold within each category.

In your case, beside the "Flow Bypass Thresholds" - Bytes per Flow=102400, you should configure a threshold in the category "Inspection Performance Thresholds". From my experience, Drop Percentage=1 is doing the job correctly to trust Elephant flows when is needed, with a little impact on the rest of the traffic.

I was thinking also that Processor Utilization Percentage might do the trick, but I don't know what exactly means: the CPU usage only for processing the Elephant flow, or the CPU usage of the core that runs the snort process which process the Elephant flow (and other traffic). In theory (my imagination), CPU=100% should mean something like Elephant=80% and other traffic=20%. Anyway, I'm happy with the reaction of Drop Percentage=1

If you will choose the same threshold, take care that documentation states "specifying an integer greater than 1 activates IAB when the specified percentage of packets is dropped. When you specify 1, any percentage from 0 through 1 activates IAB. This allows a small number of packets to activate IAB."

For the speed threshold, I used this "estimation" formula: 85% * ((DT / 2) / Snort-CPU)

DT = Cisco documented Throughput for AVC & IPS in Mbps

Snort-CPU = number of CPU assigned to snort processes (in SFR CLI run command "sudo pmtool show affinity | grep -i snort"). In my case, FP1 and FP2 snort has 8 instances assigned to 8 CPU cores - even if the box has 24 cpu cores (yeah, at least half of them I see with continuous 0.0% usage in top command).

For the 5585-X SSP-60, with DT=6.5 Gbps, my threshold speed based on above is 85% * ((6.5 * 1024 / 2) / 8) = 353.6 Mbps, that would be 45,260.8 kbytes/sec - so I've rounded to 45,000.

Regarding the Sample Interval of 2 seconds, I didn't see an increase of the "Load Average", compared with the days when IAB was off, and also the RTT in the monitoring systems shows the same level like before. In theory it should have an impact (is something more to do on the CPU), but in the case of my FPs, the impact is not noticeable (I really hope that is using other CPU cores to monitor the performance of the 8 ones that runs the snort process).

Cheers!

did you move the IAB out of test mode?  does it work as expected?

yes, and is working as expected. the number of bypassed elephants is ~40/day for an internal firewall, mostly SMB transfers.

For the outside firewalls I would not activate such option. 

Well, TAC is asking me to turn this on to see if it helps on my snort process crashing. i am running it in test mode. but i dont see any logs regarding IAB while my snort process blows up.

Sample interval of 2

drop % = 1

flow velocity = 15000

SNORT crashing

Severity: critical

Module: Process Status

Description: The Primary Detection Engine process terminated unexpectedly 1 time(s).

The Primary Detection Engine process terminated unexpectedly 1 time(s).

I've seen this kind of errors, but I think was related to the version I was using at that time (SFR 5.4.0.8 on ASA5585-SSP60 and SFR 6.1.0.0 on ASA5545).

One note: on SFR 5.4.0.8 there was an error "cannot run validator" before the DE exist in /var/log/messages from SFR, something like these:

snort[25114]: cannot run validator cisco_1e1.....b, error: [string ""]:1: ')' expected near '$'

snort[25110]: cannot run validator cisco_774......e, error: [string ""]:1: '=' expected near 'char(3)'

After I've upgraded to 6.1.0.1 and installed hotfixes O (build 3) and P (build 1), the PDE exists were not an issue anymore.

Regarding the IAB, I've implemented it for performance reasons (legitimate traffic was delayed while SFR was inspecting elephant flows) Link to cisco doc

If you have high CPU alerts before the PDE exists alerts, can be that are related to elephant flows and then IAB would help. But while elephants were passing through the SFR, I've never seen errors about PDE terminated unexpectedly, only high CPU alerts and traffic latency increasing over 2-3 seconds (from ~50ms). 

My problem is not high CPU. just Snort crashing. i dont know if 29% memory consumption is normal. but i dont have any logs to go by when it goes crashing. :( 

top - 20:41:38 up 83 days, 15:48, 1 user, load average: 0.18, 0.25, 0.29
Tasks: 105 total, 1 running, 104 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.9%us, 1.7%sy, 0.0%ni, 95.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.4%
Mem: 3299840k total, 2925632k used, 374208k free, 11524k buffers
Swap: 3310716k total, 1029340k used, 2281376k free, 530496k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15662 sfsnort 1 -19 2003m 942m 28m S 7 29.2 41:24.77 snort
15661 sfsnort 1 -19 1986m 949m 29m S 6 29.5 39:53.19 snort

To bypass flows without inspection I would highly recommend using the pre-filter policy, much easier to configure than IAB but with higher traffic flows I have seen the snort process doing a crash dump.

 

Vaibhav

Its great and much informative. We do have the same issue.

we cannot trust any traffic, so, trusting based on details such as src, dest, port, will help?

 

SMB/netbios/smtp, sometimes, being highest contributors, and inspections are mandatory.

or suggestion, in IAB, if we can apply our custom based rules or only ACL to directly trust should work.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Recognize Your Peers