cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
9242
Views
0
Helpful
4
Replies

Best Practice for Network Circuit Failover Testing

wyko00001
Level 1
Level 1

Our branch offices connect to a main MPLS circuit for day to day operations. In the case of outages, they are provisioned with a backup DSL circuit which creates a VPN tunnel back to corporate. On the router, both circuits connect on a GigEthernet interface.

I would like to implement regular failover testing for these circuits, but I am having trouble finding any published guidelines on best practice for doing so. I have the idea that simply shutting the main interface would not be the preferred way to do so since that can manually trigger certain processes on the router. I don't know if having our ISP shut their side of the circuit would be any better. My current best solution is simply having an onsite contact unhook the main Ethernet line, but for various reasons that is a bad way of doing things.

What are your thoughts?

4 Replies 4

Reza Sharifi
Hall of Fame
Hall of Fame

My current best solution is simply having an onsite contact unhook the main Ethernet line, but for various reasons that is a bad way of doing things.

That is actually not a bad test. The test you do (by shutting down the interface or unplugging the Ethernet cable from the router) simulate ISP failure which is correct.

Now,  there are cases where there is no issue with layer-1 (cable, interface, etc..) and so the physical interface does not go down, but something happens to layer-2 (the provider changes the vlan id on one side of the connection) which causes your layer-3 connectivity between your sites to fail. For cases like this, you can set up your router to ping the peer router every minutes or every few seconds (using IP SLA) and if ping fails send you an email alert, text or both, so you can get to it asap and call the provider to fix it. 

There is also another case where again the physical connectivity is good and interfaces are up but you are dealing with latency. This happens a lot with Internet circuits.  In this case, you can also setup an IP SLA to ping a device or 2 on the Internet (I use 8.8.8.8 and a name servers) and if ping fails take down the primary connection and use the backup until the primary connection is fixed again.

There could be other cases that I have seen but for now that is all I remember.

HTH

e.ciollaro
Level 4
Level 4

Hi,

I agree with Sharif that not all the failure involve layer 1, so shutting down the interf is not a good way to simulated that kind of failure. For example, the path from your site to the ISP's IP backbone is usually made by many devices; they can be Layer 1 devices, like ADM, WDM, DWDM,... or layer 2 devices, like switches but in any case you can't be sure that a failure in the middle of the path is propagated down to the port connected to your router (in many cases O&M protocol exists but not all ISP use them, especially between their devices and customer devices). In case of a failure in the path it could happen that the connection is lost but your router interface is still up/up.

What I suggest is:

  • use an IGP  between your routers and ISP's routers, for MPLS usually BGP is preferred; then simulated a failure blocking the advertisement. For example if you have BGP you can simply shut the neighbour down or ask your ISP to do this operation;  
  • if your ISP has it's own router in your site and between your router and it's router there is a switch, you can shut the interf that connect the ISP routers; this way connection is lost but your router's interface is still up/up.

Regarding the use of ping to determine the status of the connection, I would highlight that it can be a little tricky; for example, in case of congestion it could produce a swap to the backup circuit just because of packets drop or answer delay . Sometime you can't do in another way but be careful and tune it fine.

Finally using ping it's a way to determine a failure not to test the backup itself. So, to test what Sharif suggested, try to drop ping packets in somepoint in the path and check if backup work properly.  

Bye,

enrico

PS: Please rate if useful

Leo Laohoo
Hall of Fame
Hall of Fame

The best method is to power down the main router. 

Hi Leo,

it seems me that wyko has both link connected to the same router

On the router, both circuits connect on a GigEthernet interface.

In case there are two routers, shutting down the primary one is a very good way to test if backup link work properly but if the main goal is to test the backup architecture and config, this is not the best way in my opinion.  This scenario is more similar to a router failure then to a link failure. It could be part of a backup testing procedure (intended to test different failure scenario) but in my experience the great majority of failure are WAN link failures, so I suggest to test this scenario first.

Bye,

enrico