Solved: Hi Sec IT,

secureIT · ‎07-09-2016

Hi All,

I have an FWSM installed in 6500 switch. Obseved intermittent high cpu in FWSM (4.1(2))
There are 15 contexts configured in the module and only one of the context is experiencing high cpu
Observed that this problematic FWSM is showing huge connections as most used.

FWSM/FWSM-FW# sh conn
1000 in use, 113000 most used

The data sheet says:
- 5 Gbps throughput
- 100,000 connections per second (cps)
- 1 million concurrent connections (10 L)

My doubt is whether the 10 Lac connections are equally given to all the contexts or dynamically shared amoung them ?
I feel that since this is the only contxt showing most used connections, could be due to traffic or under attack, for which
i am planning to execute the command during the problem - show local-host | include host|count/limit
But I am not able to get the access to the device when cpu goes high 90% and more.

For eg, are these 10 lac conn limit as said in the datasheet, been splitted across 15 contexts, and each statically gets 100000/15 conn limit OR as and when if any context required more connections will it borrow from other contexts dynamically ? Next would like to know how to block attacks in fwsm ?

class default
limit-resource All 0
limit-resource IPSec 6
limit-resource Mac-addresses 65535
limit-resource ASDM 6
limit-resource SSH 6
limit-resource Telnet 6

Im having issues with context CTX_4 ; attached the show resource usage output from system context.

Kindly find the same and help.

m.kafka · ‎07-16-2016

Hi Sec IT,

about the connections: yes, somehow. In your current configuration any context can use connections from the global connection pool. If this global pool is depleted no other context can establish a new connection. In your case the number of connections are not the big issue, I can see no connection has been denied. But if your FWSM suffers from high CPU I would suggest to limit connection rate to stabilize the network (this would result in degraded user experience because some connection might time-out during peak times and users will need to retry). In the long run you should look into some scale-up or scale-out to accommodate the traffic or redesign the firewall policies and maybe optimize your inspections to save CPU.

I suggest you also look into the ssh-issue, almost every context suffers from ssh sessions being denied. Do you have a high number of ssh sessions in your normal operation? Or is the ssh port maybe exposed to the internet and some is probing or maybe trying to brute-force your ssh-login? This would also cost a lot of CPU.

Take a look in to the link I posted in my first reply and limit connection rate as a first step, at least for CTX4. I can't suggest how much but look at the connection and fixup peak rate of 12000. If that pushes the CPU to the ceiling try some lower value maybe 70% of the measured peak (around 8000 or 9000). Fixups are adjusting values like IP or ports in the higher layers according to XLATs and they use a lot of CPU.

Hope that helps and that you can take it from here.

Rgds, MiKa

View solution in original post

m.kafka · ‎07-12-2016

You should configure resource classes for every context.

In your current setup it's all-you-can-eat/first-come-first-serve principle. So if one context eats all the 1 million ("10 lac" is used in India?) connections then any other context can't accept new connections.

Brief config guide here:

http://www.cisco.com/c/en/us/td/docs/security/asa/asa90/configuration/guide/asa_90_cli_config/ha_contexts.html#95692

Unfortunately you can't limit cpu at the moment.

Best regards, MiKa

secureIT · ‎07-12-2016

Hi, From the given system context configuration are you able to find what is the conn configuration ?

m.kafka · ‎07-13-2016

Hi,

from the attached text document in your original posting I can tell that connections are unlimited. You can change conn limits either in GUI or on the command line.

Did you take a look at the guide i have linked?

Rgds, MiKa

secureIT · ‎07-16-2016

Hi Mika,

Thanks, you mean to say that the connections can be borrowed from other contexts if required, according to the current configuration ?

Well the source of CPU has been identified and fixed, but the FWSM did not help in identifying the source of the problem, as I could see dropped packets across all the interfaces including the server which created the issue.

m.kafka · ‎07-16-2016

Hi Sec IT,

about the connections: yes, somehow. In your current configuration any context can use connections from the global connection pool. If this global pool is depleted no other context can establish a new connection. In your case the number of connections are not the big issue, I can see no connection has been denied. But if your FWSM suffers from high CPU I would suggest to limit connection rate to stabilize the network (this would result in degraded user experience because some connection might time-out during peak times and users will need to retry). In the long run you should look into some scale-up or scale-out to accommodate the traffic or redesign the firewall policies and maybe optimize your inspections to save CPU.

I suggest you also look into the ssh-issue, almost every context suffers from ssh sessions being denied. Do you have a high number of ssh sessions in your normal operation? Or is the ssh port maybe exposed to the internet and some is probing or maybe trying to brute-force your ssh-login? This would also cost a lot of CPU.

Take a look in to the link I posted in my first reply and limit connection rate as a first step, at least for CTX4. I can't suggest how much but look at the connection and fixup peak rate of 12000. If that pushes the CPU to the ceiling try some lower value maybe 70% of the measured peak (around 8000 or 9000). Fixups are adjusting values like IP or ports in the higher layers according to XLATs and they use a lot of CPU.

Hope that helps and that you can take it from here.

Rgds, MiKa

secureIT · ‎07-16-2016

Thank you so much Mika for the detailed explanation.

My issue has been fixed as of now. Issue came from a server which was generating unwanted outbound connections, got to know by troubleshooting our major servers by keeping aside the FW to a corner. Post removing that server the cpu % came down. Is there any way to find out these kind of issues. Unfortunately the show local-host | in host|count/limit & show conn did not help. clear traffic and show traffic after 1 mins did show pkt drops in all the interface. My question here is how to trace the problematic device from Firewall point of view ? Whether the max bytes/sec or max pkts/sec or max drops/interface :)

m.kafka · ‎07-16-2016

Troubleshooting/monitoring/logging in high performance networks is difficult. Maybe for future cases look into the netflow topics. There are nice opensource netflow-collectors out there and it might be the lowest impact on a device under heavy load.

Best regards, MiKa

secureIT · ‎07-16-2016

Hi Mika thank you so much for the detailed explanation. The issue has been resolved after i suspected on a server and removed it from the network. Is there any way to trace the problematic servers from Firewall side. Actually the show local | in host|count/limit and show conn count did not help, and clear traffic and show traffic also did not help to trace the server or interface. show traffic showed many interfaces on max bytes/sec and max pkts/sec - which one to consider ? show interfaces also shown many interfaces having max dropped pkts. what are the commands that can help me here ?

Cisco FWSM high cpu