cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1136
Views
5
Helpful
18
Replies

Redundancy Plan

poulid
Level 1
Level 1

Hello. I'm sure this topic has been beaten to death, but one more kick won't hurt. We have a data center with approximatley 140 servers. Most servers connect into 2950T switches, which connect to the core switch (4507R) via gig ethernet uplink. The core 4507R has a 48 port gig ethernet line card, and routing is done using interVLAN routing, and OSPF to the WAN. Also, approx 35 servers are directly connected to the core switch. This was done because of the gig capability of the line card, so the servers with the most data could be backed up faster. We have approx 10 subnets (not including WAN), 3 of which are user subnets and 6 that are for servers.

Our manager has asked us to investigate redundancy for the 4507R, so I have recommended implementing a 4506 with the same line card config. Also, I have proposed that we remove the 2950's that uplink the servers, and replace them with a stack of 3750G-24 switches. This way we could VLAN the stack into the 6 existing subnets, and spread the VLAN's out across the different switches in the stack. This way we don't have to worry that if we lose one switch the entire subnet will go down. I have also recommended removing the servers that are plugged directly into the core, and plugging them into the stack of 3750's.

Also, our manager would like to see the critical servers dual-homed, and I don't like the idea of plugging one NIC from serverX into the 4507R, and the other into the 4506. I've got some pics, which looks like a better design? By rule, should servers be plugged directly into the core switch, or should they be plugged into the dist/access layer?

18 Replies 18

Another reason I would like to move servers out of the core switch and into a stack of 3750's is the oversubscription properties of the 48 port gb line card. We are seeing a lot of buffer errors on switchports when our backups go at night, and Cisco TAC tells me that they are showing up because we are pushing too much data through each of the 8 port groups.

The switch just doesn't seem to be built to have a lot of servers plugged directly into it when they are moving a lot of data.

Hi

We used to have our servers directly plugged into core 6500's in our datacentre until we ran out of ports and then we had to use access-layer 6500's for the servers. They both will work but a more scalable model is certainly to connect servers into access-layer switches.

Another factor to take into account are blade systems (HP/IBM etc) where each chassis has 2 intergrated Cisco switches. You would want to connect these into your core switches not your access-layer switches.

One of the key decisions in designing is to design something that isn't just fit for purpose now but can grow without a mjor change to the network infrastructure. Of course as a previous poster mentioned budget is also an issue.

One advantage of your design is that because you are not connecting your 4500 switches directly to each other (correct me if you are but your diagram doesn't show it) then you have no loops which means that STP does not kick in. But this also means that your HSRP traffic will need to go via the access layer switch links.

A more serious problem could be with your traffic flows depending on how you set your trunks up. For arguments sake lets say you set up the trunks to each switch to only allow vlans that are in use on that switch, so for example your trunks to your 3750 switches only allow server vlans on them.

Now the HSRP active gateway for one of your client vlans is the 4507. A client in that vlan wants to talk to a server. The server vlan HSRP active gateway is the 4506.

Now lets assume that the link from the client switch to the 4506 switch dies.

The client on that switch sends a packet to the server. The packet reaches the server via the 4507 switch. The server responds and sends the packet to it's active gateway on the 4506 switch.

When the packet reaches the switch the 4506 now has no way of sending the packet back to the client as it's link to the client vlan has gone down.

If you have a layer 2 trunk between your 2 4500 switches that carries all vlan traffic this would not be an issue. The 4506 would send the traffic down the L2 trunk and the 4507 would send it back to the client vlan.

That's why a lot of design have a layer 2 etherchannel trunk between the core switches. HSRP would also flow across this link.

The downside is that now you have loops in your network and STP has to come into the picture. However with RPVST+ the failover times can be reduced to seconds on the failure of a link.

It's always a trade off in design but i would go with your solution and add a layer 2 etherchannel trunk between your 4500 switches.

if you do this make sure you set the 4500's to be spanning-tree root and secondary for all vlans to ensure optimal traffic flows.

HTH

Jon

I agree with Jon. This was also a comment/suggestion I made as well in my previous post. The etherchannel is still needed regardless of your access layer design.Jon did a wonderdul job of explaining why the etherchannel is ideal.I think with the suggestions made in here you should have a pretty decent design to present to your boss.

Thanks for taking the time to write up that fantastic post. Looks like I'll add the ether-channel between core switches, and maybe I'll investigate buying a used 6509E instead of a new 4506. It seems like you can pick up used equipment much cheaper, and just add it to a Smartnet contract.

Thanks again.

Review Cisco Networking products for a $25 gift card