05-06-2005 05:14 AM
I am testing a dual 6500 switch configuration with FWSM and CSM. Both CSM and FWSM have stateful failover and MSFC has HSRP. CSM is bridged and FWSM is routing.
The configuration is basically Client - MSFC - FWSM - CSM - Real server.
The 6500s have 10G trunk between them and I have two access switches (L2) on the real server side. The L2 access switches provide a secondary path for the fault tolerance VLANs. I am pinging machines connected to each L2 switch.
When I reload the primary box, all the connected machines continue to ping and the real servers drop for approximately 5 seconds before the CSM fails over. I am satisfied with the compromise of CSM timeout and stability (I found if the CSM timeouts were too small R-PVST convergence was not fast enough and both CSMs bridged, creating a broadcast storm).
When the primary switch returns to service, the HSRP fails back seemlessly, the firewall failsback seamlessly. However, when the CSM fails back, I can no longer ping the real server.
Even though I cannot ping the real servers, I can still access the servers via the VIP. After a period of time the pings return (presumably after a timeout, approx 60sec).
While the pings from outside do not work. If I ping the default gateway (FWSM interface) from the real server, the FWSM responds and the pings from the outside suddenly restart.
I think the issue is to do with either the ARP cache in the FWSM or a problem with the movement of the real server mac-address within the switch network.
Does this ring any bells for anyone?
05-06-2005 02:30 PM
Hi,
We had a somewhat similar issue. We upgraded the FWSM to 2.3.2 and the problem seems to have disappeared.
Best regards,
Pascal
05-06-2005 05:34 PM
Thanks Pascal,
I will give it a try on Monday.
Cheers,
Dave B
05-07-2005 12:41 AM
I believe that would be the following bug fixed in 2.3.2 of FWSM.
CSCeg53853 - FWSM fails to update ARP entry when a packet is not targeted for FW
Gilles.
05-08-2005 08:00 PM
I checked the FWSM code today and is already at 2.3.2. However, I have found the source of the problem.
When the CSM function moves from the secondary to back to the active, the mac-address of the real server remains on the Po258 on the CSM that has been taken out of service.
If I clear the mac-addresses of Po258 on the inactive CSM the switch broadcasts the frame and the return frame updates the Mac-address table forwarding frames towards the active CSM. Similarly if the real server sends a frame, the mac-address table is also aupdated.
Now to find the solution. I think I will raise a TAC case for it.
05-09-2005 10:47 AM
Dave,
Configuring a shorter time for the mac address table aging time might solve the problem:
mac-address-table aging-time 10 vlan
Best regards,
Pascal
05-09-2005 03:26 PM
Thanks Pascal,
I raised a TAC case yesterday.
I shortened the aging timer on the Client side yesterday to 10 seconds, which improved it quite a bit. But still by no means hitless.
There was a bug in earlier CSM code that resulted in gratuitous ARPs not being sent for real servers on failover, but it has been fixed in later code. I am going to roll back the code from 4.2.1 to 3.1.10.
I am sure when I tested it under 3.1 code, the failover was hitless.
I will update the forum when I have tested it under 3.1.10.
05-10-2005 04:49 AM
Same thing happens under 3.1.10 code.
Back to the drawing board.
I have observed the same issue when failing the CSM from primary to secondary (and vi
I admit 5 seconds is not to bad when the entire primary unit fails (switch, FWSM and CSM), but it takes at least 10 seconds when the primary returns to service!
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide