cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
405
Views
11
Helpful
3
Replies

VPN Redundancy White Paper Available

vcjones
Level 5
Level 5

How to use multiple VPNs for higher availability or provide dial backup for a VPN tends to be a FAQ on this forum. A white paper discussing the issue and providing Cisco configurations for two example application scenarios is posted on the Networking Unlimited web site (URL http://www.networkingunlimited.com/white009.html).

<ul>Redundant Routes in IPSec VPNs </ul>

"Building a virtual private network (VPN) using IP Security Protocol (IPSec) is a popular cost-saving approach to wide area networking. One disadvantage of using a VPN is the scarcity of convenient tools to provide resilience in the face of router, firewall, or network failure. The challenge is to automatically detect failure of an IPSec connection so that an alternate route can be used. This white paper looks at two different approaches Networking Unlimited, inc. has used to meet the challenge: using a GRE tunnel to make the IPSec transport appear as a point-to-point link, and using BGP directly over the IPSec transport. Example Cisco router configurations are provided for each approach."

Happy reading :-)

Vincent C Jones

www.networkingunlimited.com

Author of High Availability Networking with Cisco

3 Replies 3

gfullage
Cisco Employee
Cisco Employee

Thanks for that.

There's also a good sample config for redundancy with two routers at one site using HSRP and the high-availability IPSec features detailed here:

http://www.cisco.com/warp/public/707/ipsec_feat.html

Interesting that we have three sample configurations, each of which attacks the challenge of VPN availability under _very_ different circumstances. Yet another example of the need to understand what the problem is before worrying about designing a solution. My experience has been that WAN link failure is the dominant cause of VPN outages, so my solutions have focused on how to provide maximum link diversity at minimum cost.

I should also point out that there are several other configuration examples on CCO which anyone pursuing redundant VPNs should investigate. For example, a simple search for "ipsec routing gre" (without the quotes) will turn up multiple routing over IPSec examples. However, the last time I searched, the use of BGP to avoid GRE tunnels is an approach unique to Networking Unlimited.

Vincent C Jones

www.networkingunlimited.com

Actually Vincent, the use of BGP for path determination in combination with crypto is probably more common than you would expect. Here are a few hypothetical examples and a few of the issues that arise in running meshed encryption between discrete organizations:

Lets suppose you have a GRE multipoint network of a few hundred nodes, and that a number of organizations participate on this mesh. For the sake of the argument lets say that the network is a flat mesh and only traffic for unique public addresses will be passed through it.

There are a number of communities of interest, these are based upon geography (state) and function (eg: defence)

Any organization may require its node to particicipate only in defined communities of interest: (eg only other defence organizations unioned with organizations in florida). This allows the different organizations to size their encrypters based upon their own security needs and not have to encrypt traffic unnecessarily.

So that the network is scalable, it is an absolute requirement that as a node joins, leaves, or changes configuration changes should only be required on the affected node, and management servers (route-servers or similar).

Now, we can make this happen by using Tunnel Endpoint Discovery (or the newer crypto profiles) and BGP, and by using certificates for authentication. By using some clever configuration on the BGP route-servers, we can have routes advertised with AS-PATH and community attributes that indicates which communities of interest they participate in, and that way individual nodes can accept only the routes relating to communities of interest in which they participate. Those sites wanting redundancy or some form of load balancing get two nodes at their site, and inject previously agreed routes from their normally active node to the route-servers who in turn distribute. Should the normally active node fail, the route-servers advertise the routes via the standby node.

There are heaps of issues that arise in this sort of situation. For a start, from a security perspective, it’s not very pretty having organizations who have no need to communicate securely having cryptographic credentials in the form of certificates that are acceptable to each other, particularly when they are effectively “one hop away” on the GRE mesh. We can’t use multiple certificates or certificate attributes for separation, as there are too many cryptographic domains and it would break our scalability requirement. So we’ll have to go back to BGP and work out a way of distributing the public addresses assigned to the routers of other participating nodes who don’t share common communities of interest in such a way that these routes are routed to null. This is still a bad way of doing things, particularly when some of the routers still reorder their certificates in NVRAM on reboot causing crypto to fail. What might be better is a kerberos arrangement, where each router uses RSA nonces to reach a kerberos server via IPSec and then retrieves the necessary crypto credentials from there. That way too, because the RSA keys are needed to reach the kerberos server, someone attempting password recovery on any node is left with no crypto credentials at all, as RSA keys don’t survive password recovery where config is changed and written. Using kerberos also means that we could better keep track of the flux in cryptographic domains and apply better security policies on them.

We’ll also have to configure policy-routing on all the participating nodes, so that traffic from hosts behind a node that don’t participate in the crypto network are never routed through the crypto network.

So that there is a little bit of enforcement applied on incoming traffic from the internet, we’ll need to apply reverse-path forwarding on the “outside” interface of the encrypter in conjunction with ACL’s so that it leaves traffic for non-participating hosts alone. Watch out for extended ACL’s with RPF, it wasn’t until later 12.2 IOS versions that it started to assess the destination component of the ACL. This isn’t perfect either, because although in this situation RPF can guarantee that traffic which should have been delivered encrypted is; it can’t guarantee that it came from the correct node on the GRE mesh. In order to do that we’d need some extensions to reverse path forwarding for GRE that enable the router to inspect the GRE wrapper applied to the tunneled packet and ensure that the GRE traffic was sourced from the correct node. But we’d also need an exception rule here for traffic coming from the Next Hop Server and possibly other hosts where me might encounter routing asymmetry within the GRE mesh.

Whilst we’re at it, some of the agencies need to ensure that some traffic is always sent encrypted or not sent, but it gets NAT’ed a few times before it gets to the agency gateway, so we’ll need to work out a way of coloring the traffic using DSCP bits lower in the network to indicate what traffic is what.

Of course, many governments are now complying with common criteria evaluation, and the Cisco IOS version that is approved on the Common Criteria is 12.2(6) in tunnel mode only, so we’d probably better come up with something that does all of the above in tunnel mode too, and not GRE, and watch out for those pesky SNMP and SSH bugs in 12.2(6) not to mention the RPF shortcoming.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: