cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2399
Views
0
Helpful
45
Replies

Failover routing design help Needed

Hello.

We are looking to have a setup like this:

                                                      User PCs

                                                            |

                                                            |

                                                            |

                                                          3750x

                                                  (stack - ip base)

                                                    /                  \

                                                   /                    \

                                                  /                      \

                     Servers---------------3750x ---------------- 3750x---------------- Servers

                                            (stack -              (stack -

                                          ip services)         ip services)

                                                 |                         |

                                                 |                         |

                                                 |                         |

                                                 |                         |

                                            Router                 Router

                                                 |                         |

                                                 |                         |

                                                 |                         |

                                              ISP1                   ISP2

We would like to have routing (and vLans) done on the switches, and have internet failover from ISP1 to ISP2 if ISP1 fails, and go back to ISP1 when it comes back up. Trunks between all switches. We also would like to have all devices on the same vLAN if possible.

What is the best approach to do this?

(Note that left and right sides [in brown and green font] are in separate site locations, and that user end [in red font] switches only have ip base, which limits eigrp functionality.)

We tried following this, but doesn't fit our site exactly:

http://www.geekmungus.co.uk/cisco-and-networking/failoverinternetconnectionusingipslatrackingandeigrproutingforinter-sitelinks

(Also ran into issue where switch in the middle would have two routes to internet - so possible issue with priority routes)

Thanks in advance

45 Replies 45

No problem. I just rechecked the configs and actually STP is blocking on the uplink to 3750_1 not 3750_2 which means all client to server traffic is going via 3750_2 to get to 3750_1.

Not what you want at all.

Should be able to post stack configs within about half an hour.

Jon

Jon Marshall
Hall of Fame
Hall of Fame

Attached are the new configurations. Probably a good idea to have a look over them and make sure you are happy with all of it and understand what we are doing.

There are a couple of outstanding things -

1) i have assumed you are running EIGRP on the routers so i have included the changes needed on 3750_1 and 3750_2 for this. You would also need to configure the router LAN interface with the corresponding IP address and change the EIGRP config on the routers.

If you are using static routes for the internal subnet(s) then we would need to modify the IP Services stack configs to reflect this and modify not just the LAN IP of the router but also the static routes because the next hop IP on the 3750 stacks are using different addresses.

2) i haven't included the delay on the 3750_3 uplink to 3750_2. So with the config supplied 3750_3 would see two equal cost routes to vlan 10 and the default. I will updfate with delay before you implement.

Other than that i should have covered everything.

Like i say, info on router config and the links between sites ie. can they run as routed links may mean slight changes to configs depending on the answers  so let me know.

Jon

Jon Marshall
Hall of Fame
Hall of Fame

I accidentally deleted last post.  Hope you got it in your e-mail but if you didn't let me know. It was about the delay setting on 3750_3.

Jon

Hi Jon,

Thanks again for this.

I'll start the implementation from tomorrow.

Only thing is, we are not running EIGRP on the router. (Because they are not Cisco)

So can you let me know what we should change the LAN IP of the router to and also the static routes for the next hop?

Okay, this may or may not be an issue so you need to check.

Because everything is currently in vlan 10 the routers do not need a static route for the clients (sorry should have thought of that).

Some routers cannot add static routes for remote subnets (they are usually routers that are very limited in terms of functionality). If this is the case then none of what i've proposed will work because you need to configure two static routes on the routers eg.

one route for the server subnet and one route for the new client subnet. So can you check that you can do this on the routers before implementing anything.

If they do support static routes then you need to do this -

1) 3750_1 - remove the network statement for 192.168.10.8/30 from the EIGRP config

2) on the ISP1 router adds the two routes mentioned above with a next hop of 192.168.10.9

3) 3750_2 - remove the network statement for 192.168.10.12/30 or 192.168.10.16/30 (whichever you used)

4) on ISP2 router add the two routes with a next hop of either 192.168.10.13 or 192.168.10.17 depending on subnet used

That should take care of that but like i say check the functionality of the routers befor doing anything.

Hopefully you saw my post in an e-mail about the delay, just in case you didn't i need -

1) 3750_3   "sh int fa1/0/47" and "sh int fa1/0/48"

2) from 3750_1 and 3750_2 "sh int po1"

so i can work out the correct delay to use on 3750_3.

When you actually get round to the downtime configuration there are certain things at each stage you need to test, make sure is working. Do you need to me to right out a very short test document you can use to make sure everything is working as it should.

Finally if the stack links from to 3750_3 only support trunks i can modify the config.

Jon

Jon Marshall
Hall of Fame
Hall of Fame

Last post of the day (unless you post something).

I have been working through all the possible failures and how the solution will react. So some general points and a few more questions -

1) if the etherchannel link fails (ie both physical links) with the new solution the servers in each site are isolated from each other. This would not happen with your current setup because STP would unblock the 3750_2 to 3750_3 link and the servers would have a path via the 3750_3.

I don't know how much of a concern this is for you. There is no way to overcome this with the new design

2) currently if the backup servers need internet access they go via ISP2. With the new solution that is the same. However this could be changed so they use ISP1 unless it fails.

3) more importantly if the uplink between 3750_1 and 3750_3 fails with the new solution traffic to the internet goes via ISP2 because 3750_2 has a default route pointing to ISP2 even though ISP1 is still up (this would happen with your current solution as well).

Is this what you want or do you want traffic to go via 3750_2 to 3750_1 and then to ISP1 ?

If you do still want to go via ISP1 then we can modify the configs. It wouldn't be more config, we would actually remove a bit of config from 3750_2.

Again it is a tradeoff ie. if we modify the config so that traffic goes via 3750_2 to 3750_1 to ISP1 then if the uplink between 3750_1 and and 3750_3 goes down the servers in site 2 are cut off from the internet (but not the other servers).  If your 3750_3 stack is actually running IP Services then even that wouldn't happen.

I doubt you do need internet access for the backup servers so it is probably just a question of how you want clients to route to the internet if the 3750_3 to 3750_1 uplink fails.

Apart from that, if you are working on 3750_3 tomorrow can you do a "sh version" and see what feature set it is running.

One last thing. If you need to test whether the routers can use static routes to internal subnets you could just try adding routes in the ISP1 router for the vlan 50 and vlan 57 IP addresses on 3750_1 and the next hop would be the vlan 10 interface IP address. None of that should have an effect on the current setup, especially as you are not running them anyway

Jon

Hi Jon,

I will get the output for the commands to calculate the delay this evening.

The routers do support static routes, so we will follow the instructions you posted.

See below for answers to questions/comments

1) if the etherchannel link fails (ie both physical links) with the new solution the servers in each site are isolated from each other.

This is fine. If both links are down, we will be required to go onsite to fix them anyway.

2) currently if the backup servers need internet access they go via ISP2. With the new solution that is the same. However this could be changed so they use ISP1 unless it fails.

As long as servers in both sites can be on the same vlan, it's ok. We actually would like them to be on ISP2.

3) Is this what you want or do you want traffic to go via 3750_2 to 3750_1 and then to ISP1?

This would be ideal, however, we need to always have servers in Site 2 connected to internet, in the event of a power failure in Site1 (due to there being remote users).

Maybe in the future, they may be willing to upgrade to IP Services for 3750_3, but for right now, we'll just have them failover to ISP2. (We'll try to convince them, but may be difficult to do right away, as IP services licenses are expensive)

Just one question, if the uplink between 3750_1 and 3750_3 fails, will users still be able to get to servers in Site1 through the 3750_3->3750_2->3750_1 path, even if they cannot route to ISP1?

We wan't to avoid having to connect users to the backup servers in Site2 unless all internal links to Site1 are down.

If this is not possible without IP Services on 3750_3, please let me know.

Just caught this before logging off.

As long as servers in both sites can be on the same vlan, it's ok. We actually would like them to be on ISP2.

This would be ideal, however, we need to always have servers in Site 2 connected to internet, in the event of a power failure in Site1 (due to there being remote users).

Maybe in the future, they may be willing to upgrade to IP Services for 3750_3, but for right now, we'll just have them failover to ISP2. (We'll try to convince them, but may be difficult to do right away, as IP services licenses are expensive)

If the power fails in site 1 then everything switches to site 2 so you should be fine.

When i wrote that if the 3750_1 to 3750_3 uplink failed the servers in site would be isolated from the internet that was a typo. Sorry it's been a long day and i have done a lot of posts in this thread   They wouldn't of course because they could still route direct to 3750_1 over the etherchannel.

I don't think you actually need IP Services because if the etherchannel fails you need to fix it onsite.

So it sounds like you want clients to still use ISP1 even if their direct link to 3750_1 fails. If so then it is easy to do this with a simple config change on 3750_2  The only thing this would mean is that in normal operation ie. everything is up the servers in site 2 use ISP1 for internet. Obviously if ISP1 fails then the servers in site 2 use ISP2 as do the clients.

Would this be okay because you seem to be suggesting above that you want the site 2 servers to use ISP2 ?

Just one question, if the uplink between 3750_1 and 3750_3 fails, will users still be able to get to servers in Site1 through the 3750_3->3750_2->3750_1 path, even if they cannot route to ISP1?

We wan't to avoid having to connect users to the backup servers in Site2 unless all internal links to Site1 are down.

Yes they will. Both 3750_1 and 3750_2 will be advertising a route to vlan 10 to 3750_3. We are going to add a delay to prefer the 3750_1 link but if that link failed the clients would simply route via 3750_2.

So basically depending on your answer to the above what we will have is -

1) 3750_1 and 3750_2 both advertising a route to vlan 10 but with the delay 3750_1 will be preferred by 3750_3

2) if you are happy for site 2 servers to use ISP1 then 3750_1 would be advertising a default route to 3750_2 and 3750_3. 3750_2 does not advertise it's own default route. That way all traffic goes to 3750_1.

3) if the 3750_1 to 3750_3 link fails then 3750_2 still gets the default route and will advertise it to 3750_3 so all traffic will go 3750_3 to 3750_2 to 3750_1

4) if ISP1 fails 3750_1 stops advertising a default route. 3750_2 then advertises a default route and all traffic then goes to 3750_2 for internet traffic. Note that 3750_1 receives this default route from 3750_2 so servers in site 1 can get to the internet. 3750_1 also passes this default route to 3750_3.

So 3750_3 gets a default route direct from 3750_2 but then adds the delay on it's link. It also gets a default route from 3750_1 (the one 3750_1 received from 3750_2) so we need to make sure that -

if ISP1 is up the delay makes it the preferred link

if ISP1 is down then even with the delay the 3750_2 link is still preferred.

That's why i said it may need some tweaking when you implement it.

The only downside to this is what i said before ie. failover would be slower because 3750_2 has to realise 3750_1 is no longer advertising a default route via EIGRP and then install it's own. You wouldn't get this if both 3750_1 and 3750_2 generated a default route each but then if the 3750_1 to 3750_3 link failed all traffic would use ISP2.

So there is a tradeoff in terms of routing the way you want vs failover speed.

There are always round these things so we could have 3750_1 and 3750_2 generate default routes and use PBR etc. to still force traffic down the path we want but i would recommend against this sort of thing as it complicates the config and i think we should leave it up to the routing.

Apologies for all the long winded explanations but i want you to fully understand how it is all going work so when you implement it and maintain it, it all makes sense.

I will revist this tomorrow to make sure i haven't made any glaring errors but i can't see any at the moment. I'll let you know the config change depending on your answer about site 2 servers to internet.

In terms of testing you really need to test everything ie. ISP failure, uplins to 3750_3 failure, etherchannel failure if you can. I can help with what you should see and how it should work if you need that.

Jon

Jon Marshall
Hall of Fame
Hall of Fame

Are you absolutely sure you want traffic to still go to ISP1 even if the 3750_1 to 3750_3 link fails ?

I only ask because that would mean a lot of internet traffic across the etherchannel interconnect between 3750_1 and 3750_2.

It's not a problem if you do and maybe ISP2 is not as good a link etc. just wondered why really.

Jon

Would this be okay because you seem to be suggesting above that you want the site 2 servers to use ISP2 ?

Please ignore that. I meant to say that we need to be able to get to servers in SiteB from outside over ISP2.

If all servers in both sites are routing to internet ISP1, it's ok.

We are also fine with the internet traffic over the etherchannel. They will soon upgrade to using fiber connections instead of copper.

Also, it looks like I skipped a few of your questions.

1) Routers won't run EIGRP, as they are not Cisco.

2) Router uplinks will be on port 1/0/1 of each of the IP services stacks.

3) All the links between sites are provided by a 3rd party, but they should all be L2 links, so we can do  the routing across them, not only trunks.

Sent from Cisco Technical Support iPhone App

Hi Jon,

Ready to start implementation, if you can help us wit the last part of the config to have the internet traffic go over the etherchannel interconnect.

Also attached the interface outputs.

On 3750_2 -

no ip route 0.0.0.0 0.0.0.0 

ip route 0.0.0.0 0.0.0.0 AD  <--- where AD is > 170 ie. anything between 171 and 254

the AD is very important. if it isn't added traffic will go straight out to ISP2 if 3750_1 to 3750_3 link fails.

I will have a look at delays in a minute and let you know what to apply and what to expect on 3750_3.

I don't have anything to test with so you will have to see how quickly it fails over when ISP1 goes down. The process of events should be -

1) ISP1 goes down.

2) 3750_1 removes it's default route and stops advertising it to the others via EIGRP

3) 3750_2 then installs it's own default route and advertises it out to the other stacks.

This should be fairly quick with EIGRP as there are only three EIGRP neighbors. If you run a continuous ping from a client in site 3 then disconnect ISP1 you will lose pings but it should then start working via 3750_2.  If it is taking too long for you we can look at alternate solutions.  Use traceroute as well to make sure it is going the right way.

You should if possible when testing all link failure scenarios ping and traceroute from a client to see how it is going. I might ask for eigrp outputs on some of the stacks if needed to see exactly what is happening if that's oksy.

Note non of the above should affect vlan 10. You should be able to run a continuous ping to a server in site1 when you take down ISP1 and not lose a single packet.

Finally until we add the delay traffic from site 3 will see two equal cost routes so it could go either way.

Good luck with the implementation and let me know how it all goes.

Jon

Just responded to your PM with contact details.

Jon

Jon Marshall
Hall of Fame
Hall of Fame

Thanks for the interface outputs. The delay on the 3750_3 to 3750_2 connection has been altered because of the current config but that will removed once you make the config changes i suggested.

So each interface has a default delay of 100 usec. This means when 3750_2 is generating the default route ie. ISP1 is down then the route is sent direct to 3750_3. It is also sent to 3750_1 and then 3750_1 sends it to 3750_3.

The route via 3750_1 will have 200 usec ie. the 3750_1 to 3750_2 link plus the 3750_3 to 3750_1 link.

We need to add a delay to the 3750_3 so the route to vlan 10 from 3750_1 is preferred but we also need to make sure that when ISP1 fails the delay is not >= 200 usec so that the default route received direct from 3750_2 link is preferred.

So if my maths is correct the delay to add to the 3750_3 connection to 3750_2 should be between a value > 100 and < 200.

You need to express the delay in tens of microseconds so pick a value and divide by 10.

After you have added this command to the fa1/0/48 interface on 3750_3 do a "sh int fa1/0/48" to make sure the delay is showing as you expected.

What you should then see in 3750_3's routing table is -

1) one route for vlan 10 via 3750_1 and one default route via 3750_1

if you then fail the ISP1 link you should then see -

2) one route for vlan 10 via 3750_1 and one default route via 3750_2

Note that i have been assuming in all of this that once ISP1 fails all traffic should go direct to 3750_2 ie. it does not go to 3750_1 over the etherchannel to 3750_2 and then out to ISP2.

If you still want internet traffic to go via 3750_1 to 3750_2 when ISP1 fails then we still need a delay but we can set it to anything we want so 3750_1 is always preferred for everything unless of course 3750_3 to 3750_1 is down.

Let me know which way you want internet traffic to go if ISP1 fails.

Jon

Hi Jon,

Again thanks.

We don't need internet traffic to go via 3750_1 to 3750_2 when ISP1 fails.

We just need to make sure that users still have access to servers in Site1.

Please see my PM.

Hopefully we will have everything up and running smoothly soon.

Review Cisco Networking for a $25 gift card