Re: Routed or STP/HSRP/VRRP

hxmengmetro · ‎03-23-2011

Hi All,

I'm redesigning the core and distribution layer in company network. Currently we only have one core/distribution switch (6504E) and 10 first level access switches (3560G). I plan to push the distribution function out of core to two 3560G switches. In addition I will get another core set up. Usually we can use STP between distribution and access layer switches. HSRP/VRRP between core and distribution. I saw lots of people talking about routed access in the campus network to eliminate the convergence time during the outage. So it uses EIGRP or OSPF everywhere. I prefer OSPF. There is no STP and HSRP/VRRP any more. Sure the access layer switch will still have vlan for end points. The main advantage of routed design is make real-time application work with low latency during outage. This is what we want for our voip and video conference. So my question is is there any disadvantage for routed design.

At this time, the headquarter has one large /16 subnet. With routed access switch, I need split /16 into several /8 subnets. Each access switch connected to the distribution switch will have different /8 subnet and everything will be summerized at distribution switch with /16 subnet.

6504E(sup32)'s throughput is even lower than 3560G switch, same 32Gbps backplane capacity. We aggregate the remote sites' traffic to the core. So the core doesn't get too much traffic. The current backplane can still handle it when several remote link get full throughput.

What do you think guys? Any advice for doing routed design? Is it worthy trying routed access design? Thanks a lot. Really appreciated.

Lou

Jon Marshall · ‎03-23-2011

Lou

If you have 3 separate layers ie. core/distro./access then traditional design is L3 between distro/core and L2 between access and distro. Youu would then run HSRP/VRRP on the distro switches and have the distro switches do the inter-vlan routing for the vlans on the access-layer switches.

A routed campus design extends L3 to the access-layer. So your vlans cannot extend across access-layer switches which is the most limiting factor. In a DC environment this is not a good solution but in a campus design it can work well. We deployed a L3 access-layer in one of our buildings precisely because of VOIP.

So the main decision you need to make is can you isolate vlans to each access-layer switch (or pair of switches). If you do not need a vlan to span all or most of your access-layer switches then a L3 design is a perfectly good design to go with. Each access-layer switch will be responsible for routing the vlan(s) that are local to that switch.

Assuming you dual connect each access-layer switch then there will be 2 equal cost paths from each access-layer switch to non-local subnets. if a link fails then it is automatic failover for all traffic to the other link.

Couple of other things to bear in mind -

1) if you are going to route off the access-layer switches then there is no need for a common management vlan for the switches. indeed this design means you can't have one. So use loopbacks on your switches to manage them

2) I take onboard what you say about OSPF but EIGRP is a very good choice for this sort of setup. You can dual link the access-layer switches and so each switch sees 2 equal cost paths. You can also then configure the access-layer switch as an EIGRP stub area so it is not used as a transit switch to route between other networks. Still there is no reason why you cannot use OSPF instead if that is your preferred choice.

Jon

hxmengmetro · ‎03-23-2011

Thank you so much Jon for your detailed answer. Dumb question: What's DC enviornment?

I just rethink of this routed design. One situation could not allow me to use routed design. We have call recording system on the network. It listens to the RSPAN vlan which is accross several switches. If they are isolated with different vlan, there is no way for recording machine to sniffer voice traffic spread over the building network. I guess we need stick to the traditional design for a while until we have new solution for recording.

Another question about convergence time: How long does it usually take to converge if one distribution switch down? Say I have two core and two distribution switches. When you have voip application running during the outage, does the voip call drops or just silent for a very short time?

Really appreciate your help!

Lou

Jon Marshall · ‎03-23-2011

DC = Data Centre

Yes RSPAN would be a problem. That would be a good reason to continue to use L2 from the access-layer switch.

A possible alternative if you have the spare fibres is to have L3 routed from access-layer for dat/voice vlans and then have a separate L2 connection from the access switches specifically for the RSPAN vlan but it does depend on the fibres.

As for failover, if you use RSTP/MST failover can be very quick. If the active link from the access-layer switch went down you can tune RSTP down to a couple of seconds failover which should be fine. Bear in mind that hopefully the failure of one of your distribution switches is not a common occurrence :-)

Jon

hxmengmetro · ‎03-23-2011

Thanks a lot Jon. I guess couple of seconds could drop the calls. The second fiber idea is great. But it doesn't have flexibility if we want to record more phones in different switches. It will need more connections from different switches to the recording server. I guess we will use the traditional design and tune the settings to shorten the convergence time.

Another question about 3560G as a distribution switch. Do you think it's capable? It's 24 ports model. We currently only use 12 ports. In the future, I could use etherchannel to bundle some links from access switch. But 24 ports is enough for a while.

Lou

Jon Marshall · ‎03-23-2011

Lou

Whether the 3560 is capable is entirely down to how many uplinks you use and how loaded these uplinks will be. It's not really possible to say without knowing bandwidth specs but as a general answer there is no reason why a pair of 3560 switches cannot act in the distribution role.

Often you see 3750s in this role for a similar sized network simply because you can stack 3750 switches and then run MEC (Multi-chassis etherchannel) which means you can connect an access-layer switch with an etherchannel across members of the switch stack. This way you do not get any blocking on the uplinks because STP/RSTP sees the etherchannel as one link.

3560 switches don't stack and thus don't support MEC which means the link to one of the distro switches has to block (note that this link could be single link or an etherchannel).

So if you are really worried about STP then 3750 switches may be a solution.

Worth noting though that there are many networks out there running VOIP on a L2 access-layer design.

Jon

hxmengmetro · ‎03-23-2011

3750 stacked distribution switches is a great idea. We do have one pair of 3750 stacked together for our SAN. I like no switch loop on 3750s. Access switch will still have two links connected to different physical switch although same logical one. I remember when I tested the pair of 3750 switches before, when one master 3750 went down, the layer 3 traffic was down for around 30 seconds although layer 2 traffic was working fine. This will definitely affect real time application between remote site and headquarter. Is there any way to shorten this time? This long down time could be one drawback to implement stacked 3750 as distribution switch.

Lou

Jon Marshall · ‎03-23-2011

Lou

30 seconds to begin forwarding data again or 30 seconds to rebuild L3 neighborships ? Either sounds too long.

Did you enable NSF on the 3750 stack ? See this link for more details -

http://www.cisco.com/en/US/docs/switches/lan/catalyst3750/software/release/12.2_50_se/configuration/guide/swiprout.html#wpxref62693

Jon

hxmengmetro · ‎03-23-2011

Actually we don't use that pair of 3750 as a router. I didn't apply NSF. They are just layer 2 access switch connected to distribution switch. The traffic between them is only managment only traffic. When we tested the failover, we powered down the master 3750 switch. At that time, we were trying to ping the master IP address, it didn't respond until 30 seconds passed. This doesn't affect our layer 2 SAN traffic at all. So we don't care about that. But right now it matters.

Tell you what. I have 2 3750-12S on hand. I can quickly set them up to test it. See what I can get.

Lou

hxmengmetro · ‎03-24-2011

Yesterday I just setup 2 3750G in a stack. Connect to one 3560G. When the master 3750G went down, the vlan 1 interface can't be pingable from 3560G for a long time even more than 30 seconds. I think NSF won't help on this. Do I miss something. I just apply channel-group 1 mode active to the interface in 3750. Nothing special. This is weird.

Lou

Jon Marshall · ‎03-24-2011

Lou

Have you enable the "stack-mac persistent" feature ?

If not can you -

1) when the mater fails, clear mac-address and arp tables and try again

2) if 1) works try using the persistent feature -

http://www.cisco.com/en/US/docs/switches/lan/catalyst3750/software/release/12.2_46_se/configuration/guide/swstack.html#wp1206500

Jon

hxmengmetro · ‎03-24-2011

I found why it took long time to ping vlan 1 in stacked 3750. Vlan 1 went down during the master switch over. The etherchannel went down too even the there is still one interface is up. That's odd. I would think etherchannel and vlan 1 should change to up after a very short time when the master fails. But it took very long for them to change to up. Something is off. I don't think MEC behaves like this.

Lou