Infrastructure Design Help Please

network2013 · ‎01-21-2013

Hi

I really need some help with my company’s infrastructure design please. It’s working but I am concerned about the design. When I originally designed it and put it to practice we had some issues and in a rush I changed things just to get it all working. It is a Live system now. Attached is the design.

I have a few questions:

1. I have hard-set the outbound routes to the internet using IP sla to use the first static route and if that fails then use the second route. It’s in a redundant scenario. Is this bad design and what would be a good way for us to start load balancing?

2. I have two BGP peers but the way I am advertising out means that only the first peer ‘should’ ever receive inbound traffic on Router 1. If the First peer goes down, the internet will route traffic through the second peer. I want to start using both BGP peers for inbound traffic. We run a lot of Video Conferening, will I have trouble with NAT if traffic goes out one peer and returns through the other?

3. I thought I was experiencing internal routing issues so I tweaked the routing to only go via Router A and Switch A. I used static routes and OSPF with better metrics and HSRP to make Router A and Switch A the best paths. Basically in the attached diagram only Router A and Switch A is actually doing any work!. What would be a better way to design the LAN?

Any general recommendations on improving the design would be very appreciated.

Many thanks!

shillings · ‎01-22-2013

Nice diagram.

I've not implemented the exact same solution, so these are just points for you to consider.

1. IP SLA works well as an alternative to receiving BGP default-routes from your ISPs. IP SLA provides more insight into the ISPs network, because you can ping multiple IPs within their cloud. Whereas, BGP default-routes will only be withdrawn if the eBGP peer is lost. I prefer IP SLA, but it doesn't make a lot of difference either way.

2. If you were multihomed to the same ISP, then you could split your prefix in two and advertise half via each circuit.

Am I right in thinking that, prior to the OSPF modification mentioned above, each ASR1001 advertised a default route into OSFP and each core/distribution layer switch saw two equal cost default routes in its IP Routing table? If so, then for that method to work, you must somehow ensure the same source LAN IPs always use the same gateway ASR1001, such as applying Policy Based Routing (layer-3) or load balancing the VLANs (layer-2). Bear in mind the CPU impact of doing PBR.

3. Yes, I can understand why you've done this. This is one of the advantages of an active/standby ASA pair. They force you towards a similar design that draws all traffic through a single NAT-enabled device. Only once outbound traffic is on the outside of the active ASA is it load balanced across a pair of WAN circuits, ideally using per-destination IP CEF which doesn't impact CPU. However, ASAs are designed to failover and have a dedicated failover link (heartbeat). Also, the standby ASA can send out probes to specific interfaces via either LAN or WAN interface.

network2013 · ‎01-22-2013

Thank you for your response. I need some time to think over a few things but these are my first thoughts.

1. I like how IP SLA works, it's simple so I might stick with it for now. I was also considering receiving full internet routing tables from my two peers. In your experience do you think we might see benefits from routing between ISPs that have better paths? (yes sounds like a logical answer but im interested in someone elses thoughts)

2. Yes I was injecting default routes with the same metrics but I later changed the metrics to force traffic out of Router1. I will take a look at PBR. I have never used it but I do understand it.

3. I have two ASA5520's I can use. If the NAT translations can be performed on the ASA's that might help. Will I have the ASA's behind my two ASR's? From what I just read ASA's can't do load balancing and can only have one default route so I presume thats why you refer to load balancing with CEF on the ASRs right? Since I have two ASR's would I need to point the two ASA's at a VIP address between the ASR's and use maybe GLBP to load balance on the egress?

Big thanks!

shillings · ‎01-22-2013

Just taking a step back Paul, is the existing solution actually working - i.e. does it load balance and does it failover OK? It's possible that you are best leaving it alone, especially if the solution is relatively easy to understand and manage. If not, then can you test any proposed solution in a lab?

Attached is a couple of diagrams. I tested Topology A in a lab and it was implemented successfully. Topology B is a simplified version that remains untested. I'm sorry they are a bit basic, but I don't have time right now to flesh them out.

TA used an IGP between WAN Edge routers and the L3 switches, with redistribution between OSPF and BGP, carefully filtered of course. IP CEF was used to load balance outbound traffic. The ISP circuits were both 100Mbps. WAN Edge routers were 2911s, WAN L3 switches 3560-X series, firewalls were 5520s and the core was 6500s. Inbound load balancing used BGP AS_PATH attribute as described in a previous post. HSRP is running on both the inside and outside to provide FHRP for the ASA active/standby pair.

TB avoids the need for the above descirbed IGP and, in theory, could instead rely heavily on IP SLA pings tied to static routes. I think it's a simpler design to manage. There is a lot of detail in TA that is not obvious until you try to make it work, especially when a similar implementation was split across two sites several Km apart. It was not easy to document and explain, especially when the customer needed all the (dozens) of possible failure scenarios documented.

network2013 · ‎01-23-2013

Thanks for your response and your time,

Everything is working ok but we are not load balancing between ISPs. After some BGP tweaking all traffic is going in and out of the same router from just one primary peer.

The second peer does work but when I disconnected Peer1 it took 3 secs to failover to the Peer2 for outbound routing. I think I need to review the IP SLA's. It was a bit rushed when I first configured it.

I was also running a continuous ping from a separate internet connection to one of our public Ip's and tested the failover. It took 2-3 minutes!. That seems a lot. Is that a normal time do you think for the upstream routers to route via their alternate paths?

I should build a lab like you say and play with a few different scenarios.

shillings · ‎01-23-2013

Sorry to say, but I can't recall what sort of failover times I've seen in the past, but 2-3 minutes doesn't particularly surprise me. You could ask your ISP to tune their BGP timers down a bit.

At least if you stay as you are, then you can failover and never incur any drop in throughput, presuming the bandwidth is the same on each circuit.