Re: Primary and backup WAN with different bandwidth QoS policy

Chris.Doyle · ‎08-22-2013

I have a scenario where we're moving providers and most sites will have two routers with a layer 3 switch. We'll have the single VPLS instance from the provider where all routers will be connected using private network 192.168.4.0/24. All routers will participate in the same eigrp AS for simplicity including the layer 3 switch. The primary WAN link at each site will be 350Mbps and the backup WAN link will be 100Mbps. QoS is enabled on all routers.

If the primary WAN router goes down at the spoke site, the hub router's QoS policy (shape average 350000000) will still think it's got 350Mbps to work with because of this command. Is this a matter of changing the shape average manually down to 100000000, or is there a smarter way to deal with this?

The secondary hub router could have a different set of policies which contains the secondary bandwidth settings, but the successor route will stil be through the primary hub router.

See attachment.

Lei Tian · ‎08-22-2013

Hi Chris,

Just want be clear on your question. What you are looking for is making the traffic flow symmetrical. Meaning if the packet is coming in from branch on 100M circuit, you want the return path go via 100M circuit from hub. If the packet is coming in from branch on 300M circuit, you want the return path go via 300M circuit from hub.
Is that correct?

Your topology shows direct connectivity between spoke and hub, maybe you can advertise specific routes from one neighbor and less specific from another neighbor. So, primary hub will learn specific routes from primary spoke, and less specific from secondary spoke; secondary hub learns specific routes from secondary spoke, and less specific from primary spoke.

Sent from Cisco Technical Support iPhone App

Chris.Doyle · ‎08-22-2013

Hi Lei,

If it's coming from the 100M circuit from the branch it doesn't have to return from the 100M cuicuit from the hub. The reason I suggested this was the primary hub could have it's QoS policies set to the primay bandwidth (350) and the secondary hub could have it's QoS policies set to it's secondary bandwidth (100). If you're forcing traffic only via this router however if the secondary hub were to fail you'd lose connectivity if you're forcing the path.

I'm not sure there is an intelligent way to deal with this or not. I'm open for suggestions as we're just planning in case of. At the end of the day we'd just have to be aware to change the shape average command manually. We were just hoping for something that could just take care of it.

Lei Tian · ‎08-23-2013

Hi Chris,

You won't lose any redundancy. With that routing enforcement, if the primary spoke WAN router fails, spoke will use the secondary WAN router to get to HUB. HUB will also use the secondary WAN router for the return because that's where it learns more specific routes from. That way, you make both spoke and hub using 100M circuit. However, if you consider double failure in case spoke primary fail and hub secondary fail, then you will have the bandwidth asymmetric issue, but normally we dont design based on double failure.

My understanding is you are looking for to resolve asymmetric bandwidth issue between hub and spoke, which happens when routes are asymmetric between hub and spoke. My suggestion is to make the route symmetric instead of making the QoS policy dynamic.

HTH,

Lei Tian

Chris.Doyle · ‎08-23-2013

Ok we'll that sounds like something to consider. Do you have any config to show you would achieve this?

Kelvin Willacey · ‎08-23-2013

I agree with Lei, you would just need to ensure that the routing is still symmetric when a failure occurs. The routes that are learnt by the layer 3 switches would have to be such that they should prefer the 350 Mbps links on each router and then the 100 Mbps links. It's just a matter of calculating the metric and using the correct delay.

Chris.Doyle · ‎08-27-2013

@KWillacey - i don't have an issue with getting the route down the preferred link, i.e. 350Mbps. That's easy, adding the delay command on the secondary router will acheive this.

The issue is if both primary and secondary hub routers have QoS policies set with their shape average bandwidth amounts when the primary router goes offline and you no longer have 350Mbps available QoS doesn't know that the receiving router is now only running at 100Mbps.

I could have the primary router QoS policies set to 350Mbps and the secondary at 100Mbps but then when the primary spoke router goes offline because of the metrics in the config all routing will still occur through the primary hub router because it has a higher bandwidth set on the interface and no delay command set.

There must be an intelligent way to deal with this scenario, i'm sure it isn't get two 350Mbps links.

Symmetric routing is occuring already, however as I've mentioned because of the metrics that are currently set in the config, the route preferred will always be the primary hub (192.168.4.1).

Kelvin Willacey · ‎08-27-2013

Hey Chris I guess I am not getting the full picture. Based on your diagram is it safe to assume that all of your of routers are connected to both the 350 Mbps and 100 Mbps circuits? What path do the packets take when the primary WAN router at the spoke site fails?

I am thinking that the routing should be setup such that if the 350 Mbps link fails on a router but the circuit is still up then the switch should send traffic to the backup router and over the 350 Mbps link. I can't think of a scenario where this would not work, unless the backup router is poisoned so much so that if the 350 Mbps link fails on the primary spoke router the flow of traffic becomes asymmetric. That is the primary hub sends traffic over the 350 Mbps link but the spoke responds over the 100 Mbps link.

I will lab it up and let you know, because it seems to me that you are only facing this issue because asymmetric routing occurs when a link or router fails, which it should not if configured to your specifications.

Chris.Doyle · ‎08-27-2013

Thanks for taking an interest. So yes the 350Mbps and 100Mbps are connected to both sites from the same WAN provider. It's connected using a single VPLS instance. Obviously the bandwidth is limited from the providers side.

If the primary spoke router fails the current path is still via the primary hub router which will send it to the secondary spoke router. As the primary spoke is off the packet will return from the path it came which would make it symmetric.

I've set this up using packet tracer and yesterday I thought I had it so that in the event the primary spoke router fails the path the packet would go would be via the secondary hub to the secondary spoke over the 100Mbps connection. However that didn't appear to be working today so i'm not sure what changed.

From the L3 switch point of view it's seeing the route to the remote network as the same cost. I can resolve that by changing routing metrics on the L3 switch but once again i'm back to the same issue.. It will always chose the primary route to the 350Mbps router. I think i've misconfigured something with this because I don't see why it's seeing the routes as the same as the connecting routers have either higher bandwidth and lower delay or lower bandwidth and higher delay.

The L3 switch is connected to both routers using two seperate VLANs with point to point connection with a 30 mask.

How can the switch send traffic to the 100Mbps backup router and over the 350Mbps primary router if it's down?

If the best way to deal with this is to shape the QoS bandwidth for the 350Mbps network on the primary routers and the 100Mbps network on the secondary routers then if primary spoke turns off (192.168.4.50) and I needed to get traffic from 10.0.1.0/24 to 10.0.50.0/24 and make sure it doesn't think it can use more than 100Mbps i'd like it to travel down the following path

10.0.1.1 --> 192.168.4.2 --> 192.168.4.51 --> 10.0.50.1 and back the same path.

But because the metrics set to make sure routing is always down the 350Mbps path the switch will always send this traffic to 192.168.4.1 then to 192.168.4.51 if 192.168.4.50 is off.

Hope this helps.

Chris.Doyle · ‎08-27-2013

@Lei Tian - In reply to trying to resolve asymmetric bandwidth issues, the current route path isn't asymmetric. With the bandwidth and delay settings already in place the route goes down and comes back the same path.

I still don't see how you're going to make the secondary hub router become a more attractive route when the interfaces involved have the bandwidth and delay hard coded into the config. Unless both primary spoke and primary hub routers go off, i'm not going to acheive this.

This doesn't sound like an uncommon scenario or perhaps i'm overlooking something. It's probably just as easy to change the shape average on the fly when a hardware failure occurs, the issue with this is someone will forget to change it so they'll be running at half the bandwidth.

Lei Tian · ‎08-27-2013

Hi Chris,

If you look my reply, my suggestion is advertise specific routes to the neighbor that matches the bandwidth, and summary route to the neighbor that has mis-match bandwidth. So primary hub learns specific routes from primary spoke, summary route from backup spoke; backup hub learns specific routes from backup spoke, and summary from primary primary spoke.

HTH,
Lei Tian

Sent from Cisco Technical Support iPhone App

Chris.Doyle · ‎08-27-2013

Do you have an example config you could share?

Are you suggesting using a route map to match the traffic and tell it where to send it?

Lei Tian · ‎08-28-2013

Hi,

I don't have any configure example, but if you want share your current, we can work on the config.
Not policy based routing, I was thinking about tools like normal routing update, summarization or route filtering.

HTH,
Lei Tian

Sent from Cisco Technical Support iPhone App