cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1009
Views
0
Helpful
3
Replies

Not achieving failover results desired

Willard Dennis
Level 1
Level 1

Hi all,

I am tasked with providing an internal routing solution to replace a branch's ASA as the router for the internal networks. Right now, they have two internal networks, and whoever set them up used one interface on their ASA for LAN1 ("78-net" in the diagram below) and another interface on the ASA for LAN2 ("79-net" in the diagram below.) However, they have been having transfer speed problems between the two internal lans, and now they need to instantiate a 3rd lan. So, I decided to put a L3 switching solution in place before the ASA, and make it redundant (right now, if the ASA fails, they not only lose their Internet connection, but also their internal router.)

So here's a diagram of what I came up with:

New_Branch_Network_Arch.png

(I am using two existing switches, so that's why the disparity in switch models on the new layer... They use all gig-eth switches there, but I didn't have but one L3 gig-eth switch available, so I'm using a 10/100 3560 as the HSRP standby switch.)

In any case, I have the solution set up in a test-bed environment, and it works well, except when I power off the 3750G to simulate a system failure. In that case, I'm losing about 65 or so pings while the failover takes place (pings from an end-station on the bottom "79-net" switch to a remote end-station over a site-to-site VPN tunnel of which the ASA is the local endpoint.)

I have set the HSRP timers on the two L3 switches to be 1-sec hellos and 3 sec's before marking down (I know I can do msec timer values, but was following a Cisco best practices doc for HSRP that recommended "standby n timers 1 3" as a minimum.) So I don't think that's the problem, nor do I think it's a EIGRP issue. I am using the default "pvst" STP method however on the switches; and I'm wondering if this is the cause of the problem (STP has to unblock the links from the 3560 down to the lower layer switches when the 3750 links go down.) Right now on my testbed, the lower layer of switches are non-Cisco, but in production, they are Cisco switches. Should I use Rapid STP in this case to improve the STP convergence time, and/or are their any other tweaks I can make to improve the failover time? I'd like to get the ping loss down to only 1-2 lost (or better) if I can.

Thanks all --

Will

1 Accepted Solution

Accepted Solutions

Jon Marshall
Hall of Fame
Hall of Fame

Will

Yes, use RSTP, that will significantly shorten the failover. If you could get another 3750 then you could "remove" STP from the equation (although it would be still be running)  by using the 3750s as a switch stack and then running MEC (Multi-chassis etherchannel) from the access-layer switches to the switch stack.  Still not sure this would account for 65 pings though.

Another alternative is to make the link between your 3750 and 3560 a L3 link. This way both uplinks from the access-layer switches wil be forwarding. The recommendation with this design is to isolate a vlan per access-layer switch ie. no vlan spans across multple access-layer switches but from your diagram that is exactly what you have anyway.

I'm also not sure about the ASA setup. It has 2 inside connections on /30s. How does the ASA know which interface to use to reach internal subnets. What does a "sh route" show on the ASA for the internal subnets ?

And how have you setup the VPN and NAT on the firewall, if there is any.

What i would do is -

1) change to RSTP

2) check then how long it takes to failover. If you are still seeing a long time i would start looking at the ASA and the fact it may be getting confused.

But like i say, change to RSTP first. If it still is a problem then come back.

Note don't try the L3 link thing just yet because if the ASA is causing problems then you may need to stick to a L2 link.

Jon

View solution in original post

3 Replies 3

Jon Marshall
Hall of Fame
Hall of Fame

Will

Yes, use RSTP, that will significantly shorten the failover. If you could get another 3750 then you could "remove" STP from the equation (although it would be still be running)  by using the 3750s as a switch stack and then running MEC (Multi-chassis etherchannel) from the access-layer switches to the switch stack.  Still not sure this would account for 65 pings though.

Another alternative is to make the link between your 3750 and 3560 a L3 link. This way both uplinks from the access-layer switches wil be forwarding. The recommendation with this design is to isolate a vlan per access-layer switch ie. no vlan spans across multple access-layer switches but from your diagram that is exactly what you have anyway.

I'm also not sure about the ASA setup. It has 2 inside connections on /30s. How does the ASA know which interface to use to reach internal subnets. What does a "sh route" show on the ASA for the internal subnets ?

And how have you setup the VPN and NAT on the firewall, if there is any.

What i would do is -

1) change to RSTP

2) check then how long it takes to failover. If you are still seeing a long time i would start looking at the ASA and the fact it may be getting confused.

But like i say, change to RSTP first. If it still is a problem then come back.

Note don't try the L3 link thing just yet because if the ASA is causing problems then you may need to stick to a L2 link.

Jon

Thanks, Jon. Changing over to RSTP did the trick, I only lose between 2-3 pings now when I down the 3750.

I did check the routing table on the ASA when I do the failure sim - it only takes about a second or two to fail over from using the link to the 3750 to using the link to the 3560 to reach the the 3 VLAN subnets when I pull the plug on the 3750. However, on my test ASA (a 5510, which only has 10/100 NICs) once it fails over to using the link to the 3560, it stays there even though the 3750 come back up again, because the cost for either link is the same (on the production ASA, the link costs will be different, since the production ASA has gig-eth links, so the cost to the 3750 will be less than the 3560.) Do you know offhand how I can weight the link to the 3750 so that the ASA will always use it if it is available?

Thanks again,

Will

Will

Interestingly enough i am dealing with a question in the firewalling forum about this. You should be able to use offset-lists on the 3560 to increase the metric so that the 3750 is the preferred route if both switches are up.

However that doesn't seem to be working for the guy in the firewalling forum, but it may be worth a try in your scenario.

Jon