Solved: Re: High Sessions on FWSM 3.2(6) Causing Failover

GrumpyBear · ‎06-17-2010

We have noticed over the last couple of Vulnerability Scans of devices behind a pair of FWSMs in a Active/Standby Single Context running 3.2(6) that failover is occuring due to lack of a heartbeat within the 30s window.

We have deployed a new Nessus scanner on a dual quad-core machine with lots of RAM. After the first instance we forced the NIC on the Nessus scanner to 100Mbps. The failover still occurs.

I beleive that the heartbeat message is not getting through due to the CPU overhead of the session creation & teardown in conjunction with the debug logging to 2 destinations; our MARS210R and a Linux Syslog server.

While the specs for the FWSM state 100,000 conns/sec our utilization monitoring (MRTG) shows only a Max of 2973cps on a 5-min smoothed average. The MARS reports 100,000 events/min from the FWSM and the syslog server shows 1.63 million log messages processed in the 5-minute interval the failover occured.

Questions:

Will a FWSM OS upgrade help this out (i.e. something that provides a better CPU slice to the heartbeats) - we are constrained to 3.2(6) due to OS dependency H$LL with CSM?

Will a special rule to not debug log the traffic from the Nessus server lower CPU utilization?

I'm not really comfortable adjusting the failover timers as I really don't want to mess with the devices ability to quickly respond to a real failure rather than us just shooting ourselves in the foot :-0

twitchy/pri/act# sh failover
Failover On
Failover unit Primary
Failover LAN Interface: failover Vlan 2 (up)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 15 seconds
Interface Policy 50%
Monitored Interfaces 0 of 250 maximum
failover replication http
Config sync: active
Version: Ours 3.2(6), Mate 3.2(6)

Regards,

Bruce

Panos Kampanakis · ‎06-17-2010

Yes, logs on ACLs will spike the cpu. Make sure you remove them and you will see great improvement.

Also snmp and routing protocols can spike the cpu.

I hope it helps.

PK

View solution in original post

Panos Kampanakis · ‎06-17-2010

Yes, logs on ACLs will spike the cpu. Make sure you remove them and you will see great improvement.

Also snmp and routing protocols can spike the cpu.

I hope it helps.

PK

GrumpyBear · ‎06-17-2010

Thanks - I added explicit permits for the nessus server appending "logging disabled" as we do need the debug level logs for accurate false positive recognition on the MARS for the rest of the traffic. I was dropping the nessus events on the MARS but it makes more sense to tune the rule at the source.

Regards,

Bruce

Panos Kampanakis · ‎06-17-2010

I believe you will see great cpu inprovement.

Good luck...

PK

GrumpyBear · ‎09-27-2010

OK - so on to the next round of Quarterly Vulnerability Scans ...

The FWSM failed over again.

Time to adjust the timers ...