RV042 Smart Link / Failover is Sticky

fmarshall · ‎04-17-2012

RV042 in Router mode.

WAN1 preferred.

With Smart Link it seems to work to a point.

When WAN1 fails, it fails over to WAN2.

But then it gets stuck on WAN2 and I have to manually switch to WAN2 preferred and then back to WAN1 preferred to get WAN1 connection to return.

The test IP addresses should be just fine as set.

Is there something I should be doing differently?

Jasbryan1 · ‎04-17-2012

Fred,

In your network service detection make sure you are using the default gateway of each ISP's for each WAN. You have to think of layer 2 most likely never goes down unless modem dies. So Layer 3 (which is IP addresses) this is why it's best to use the ISP default gateway. Now layer 2 You can simulate this why just unplugging he Ethernet cable out of each wan to see if smart link failures accordingly b/c you can't simulate a layer 3 network down easily without having other network equipment up stream that you control.

Jasbryan

fmarshall · ‎04-17-2012

Well, this brings up a question that I've had for a long time and it looks like you're trying to help me understand. I just don't yet.....

Why are there all those categories in the setup?

i.e.

Default Gateway

ISP Host

Remote Host

DNS Lookup Host

???

What I've been doing is entering an IP address in the ISP Host space.

The IP address might be at a linked internal device or an external IP address such as the ISP DNS server or mail server or....

The idea has been this:

If the WAN1 link dies between the local site and a remote site providing internet connection then I want to failover. But, if the internet service to the remote site dies I don't want to failover. This means I need to sense some device within the remote site.

If the WAN2 link dies for internet service then I don't want to use it. So, I use a public internet IP address here.

I'm sure you can translate this into your own language as I didn't follow your reply all that well. Maybe you can tell me what would be better to do.

Pulling the cables does seem to work.

But, since WAN1 is "preferred" should the system continue to test it and switch back when it's available?

Right now it seems to stick on WAN2 once switched.

Thanks for the quick reply.

fmarshall · ‎04-17-2012

Perhaps I should explain the topology a bit. It's pretty simple but does have a few devices:

The question here is about RV042#1.

There is LAN2 at the originating site with RV042#1 linked to the main site LAN1 for intersite comm's AND LAN1 is the primary internet source.

LAN2 <>LAN RV042#1 WAN2 <> firewall <> ADSL modem <

RV042#1 WAN1 <> private link interim subnet<>......

........<>private link interim subnet<>LAN RV042#2 WAN1 <> LAN1 <> firewall <> LAN RV042#3 WAN1 <> ISP fiber

RV042#3 is only an interfacing device with the ISP fiber - with multiple public addresses available on its LAN side.

RV042#2 is an intersite link router.

RV042#1 is set up to failover with WAN1 preferred as it goes to the higher speed ISP fiber via the connected site.

It should failover if the private link goes down but not if the ISP fiber internet goes down - thus favoring the inter-stie connection over internet.

RV042#1 fails over to WAN2 which provides internet connection / giving up nter-site connection.

All RV042 addresses are private range addresses except RV042#3 which has nothing but public addresses and seems to work fine.

Jasbryan1 · ‎04-17-2012

Fred,

Think of the device (modem) as physically plugged into each WAN (layer 1) is the physical connection between each device. Layer 2 (data link layer) which is forwarding of data frames from one unit to next. (Layer 3) routing layer. Your default gateway of your network is your RV042. Each wan ISP has a default gateway (layer 3) which is a default route out to next network and so on. Generally if you can reach your ISP default gateway then you have internet access unless someone has major screwed up. So I tend to use my ISP default gateway as for my service detection.

Since you are trying to simulate a network down just physically pull the Ethernet cable out wan port, this will stop the forwarding of frames (layer 2) and cause failover.

How failover will work with network detection, let’s say we have set our ISP default gateway for detection. The router will send out a ping continuously to those addressed and if that address fails it should cause a failover. Give 30 second for failover

Try and pull the cable physically for testing right now and see if failover works.

Jasbryan

fmarshall · ‎04-17-2012

Yes, as I said, pulling the cable causes failover but I'm not interested in manual intervention as a general rule. As a test, it does work as I said before. My concern is being stuck on WAN2, the secondary WAN connection once failover has occurred once. Or, is the device behavior that all you get is failover one time and THEN manual intervention is REQUIRED?

The latter can be answered yes or no.

If "no" (meaning it will automatically switch back) then is there some magic about which IP addresses I put where in the setup? I'm reading a lot of explanations but not explanations that answer my really fundamental question about how this should be done.

I'm getting a bit concerned that there is a mindset or mental model that doesn't necessarily match up with the reality here. For example, there are NO modems attached to any RV042s anywhere. So why the mention of modems in particular?

RV042#1 is the default gateway on LAN2, yes.

And, certainly yes, I can understand using "modems" as a generic model.

In my simple mind I provide an IP address to ping as a check of "link is up". That's all. What more is there to it?

Now, I do understand that there can be Level 2 communications between devices based on their MAC addresses but I don't see why that would matter. If the target fails to respond, it fails to respond whether Level 3 IP addressing or Level 2 MAC addressing. Why would it be any more complicated than saying: the test succeeds or the test fails?

Why are there 4 different settings possible? That's not clear at all. It almost seems that the failover test will run through a whole set of 4 if they are provided. It's not clear that they MUST be provided. Logic suggests that a single address would be enough and that one could select an address in the topology that suits the needs. Is that not the case?

fmarshall · ‎04-20-2012

I really appreciate the help. I think I would do better if I understood the definition and purpose and INTERACTION of those 4 entries.

Well, I can figure out "Default Gateway" ... I think. But maybe I don't understand the designer's context for this one even.

And, I tried entering the DNS Host and an IP wouldn't do so I put in the URL. I also think I can figure that one out OK.

Then there are: ISP and Remote Hosts.

According to the documentation with my comments at "***"

Default Gateway:

If you check this item, the Router will ping the default gateway first.

***OK. That's easy. But it doesn't say "you must check this item". So, I had not.

ISP Host:

After ping Default Gateway, the Router will ping ISP Host “Retry timeout" later. The ISP Host is provided by ISP.

***I guess I just pick an IP address belonging to the ISP or what? Or I could pick the public address gateway at the ISP.

Remote Host:

Enter the IP address of Remote Host that you’re going to ping.

***OK. So I could pick anything in the public address space that normally works, eh? But, does it necessarily have to be a public address? How about an upstream address in my network?

DNS Lookup Host: Enter the Host Name or Domain Name that you’re going to ping.

***Well, I wouild have thought that this means the host name of the ISP DNS server. But here it seems to say it can be almost any URL. Is the point here that it's a test of DNS service?

I think perhaps this will help reveal where I'm getting hung up. I should think that the design intends to test the closest connection first and surely failover if it fails. Then on to the next, and the next, etc.

If that's the case then I should think one could pick a single IP address to test and that's all. Is that correct. That's what I've been doing because I think that will effect the behavior I need.

I look forward to hearing more. Thank You!!

fmarshall · ‎04-22-2012

Thanks for sticking with me on this question. I'm sorry if my last 2 posts were a bit long and detailed.

The questions that may boil it down is this:

"Is manual intervention required to switch back to the Primary WAN after failing over to the Secondary WAN if "Remove the Connection" is selected?

(the documentation says: "Once ISP returns to connect, the traffic will be dispatched back." but I don't see this happening)

"If "Generate the Error Condition in the System Log" is selected does that imply that the connection will NOT be removed? As in switched away from?

The other question is:

"Can only a single IP address be entered into the Network Service Detection"? From the comments you've given it seems like maybe not. But the documentation says "at least one".

rmanthey · ‎04-23-2012

Fred,

First the reason for the multiple options is to provide multiple solutions for failover. Some clients would use different options for there topology or reason. First if the default gateway is used what happens if the problem is after the modem (usually the default gateway for your demark) then the link wouldn't failover and your WAN would be down, because the default gateway modem is working but nothing past it. If you use ISP host this might work but what happens if the ISP is having an issue with their connection to the Internet, you would be down and would not failover because the ISP is working, just not there connection to the world, Plus sometimes ISP's block ICMP to their devices. Remote host works but what happens if that remote host stops responding, you go down for no particular reason. DNS lookup host this is probably the best option to use a public DNS server, the problem is some ISPs block dns queries out, they only allow queries to their DNS servers.

So to answer you only need one, but which one depends on the implementation you need. If you are familiar with IP SLA monitoring on the Enterprise side it seems like it functions most like that methodology. It should fail back over with in a given period of time, depending on the stability of the link. You could experience flapping which would cause the failover to come up and go down, so it would look like both WAN's were down. For this reason it needs to wait to insure the WAN is truly backup and stable. I do not have the timers, hopefully someone can respond with those numbers if they have them.

hope this helps

Cisco Small Business Support Center

Randy Manthey

CCNA, CCNA - Security