07-04-2009 03:52 AM - edited 03-06-2019 06:35 AM
Hi,
I have been working on a serious problem for over a week but so far have not reached a solution. We are experiencing severe network flooding on all switchports which is causing heavy packet loss across servers. To start with I thought it was a broadcast storm but when I connected a laptop to the Server VLAN and used Wireshark I was seeing a large number of unicast packets not relating to the laptop but communication between servers. To start with this didn't make a lot of sense as it was looking like our switches were behaving more like layer 1 hubs.
To describe our infrastructure we have 2 x 6513 core switches, each of these has a 4948 switch connected to it, and connected to these are 3020 switches in HP blade enclosure chassis. There are 4 x 3020 switches in a chassis with 2 of them connected to one 4948 and 2 to the other. There are port-channels connecting between the 6513s, 4948s and 3020s.
The server blades within the enclosures have 4 NIC ports with one connecting to each 3020 switch. The four NIC ports are bonded together in a mixture of Transmit-Load-Balance and Aggregate-Load-Balance.
Now this is where one of my theories to the source of the problem as TLB bonding across switches may cause the mac-address-table to miss entries and therefore flood out of all ports. I'm not sure if ALB has the same outcome.
Something else I've noticed on our network is the port-channels are not consistant. The 6513 and 4948 port-channels are set to src-dst-ip load balancing but the 3020 switches are set to src-mac load balancing. This one may be unrelated as it would only affect the load balancing technique but it's something I thought I should mention.
Please let me know if anyone has come across a similar issue in the past and what you were able to do to remedy it.
Thank you in advance!
Craig
Solved! Go to Solution.
07-05-2009 06:11 PM
There might be a clue in ". . . not sure if bonded across 2 or 4 switches is such a good idea. "
If L2 flows can happen such that switch doesn't see flow in both directions, switch will unicast flood.
One possible solution, might be to bond to the same physical or logical switch but on different hardware ports. I.e., different cards on a chassis switch or different switch members on a stackable switch (e.g. 3750, 2975).
07-04-2009 04:36 AM
Is asymetrical routing possible? If so, you might want to read: http://www.cisco.com/en/US/products/hw/switches/ps700/products_tech_note09186a00801d0808.shtml.
07-04-2009 05:34 AM
Unicast flooding can be quite common in campus networks and there can be a few causes. Asymetric routing is the most common, however there are other reasons this might be happening:
http://www.cisco.com/en/US/products/hw/switches/ps700/products_tech_note09186a00801d0808.shtml
Microsoft Network Load Balancing can also cause this if it isn't quite set up correct for your environment. I saw a backup between two servers at Gigabit kill a pretty big Layer-2 network recently. For some reason the servers were deployed on a big flat layer-2 network and the backup was configured to write to the NLB address - clever guys....
The best way to troubleshoot this is to look at the ARP tables on the Layer-3 routers for the VLAN you are seeing the issues on, in particular the hosts you are seeing the unicast flooding to (poosibley look at hosts ARP tables as well if they are on the same VLAN). Then look at the layer-2 switches in the VLAN and look at the CAM tables for the MAC addresses you have ARPs for. Then try and work out why you have ARP entries but no CAM entries (becasue that is what will be happening).
HTH
Andy
07-05-2009 12:45 PM
Thanks for the replies.
I am shamed to say that all the servers in our DC exist on one single class B network on one VLAN. I'll say right now this was done well before my time and changing this would be difficult. There is a mixture of Linux and Windows servers, all of which have bonded interfaces. For all network activity it is these bonds that are accessed. This troubles me as I'm not sure if bonded across 2 or 4 switches is such a good idea. However, I have to prove that is where the problem lies.
What I have noticed is the CAM tables vary from switch to switch, which goes towards proving my argument around bonding as it means other switches are unaware where traffic should be going. I got the Linux team to change some of their servers from Transmit Load Balance to Aggregate Load Balance and this has helped remedy the issue.
The thing is I'm really not sure how to go forward with this now.
07-05-2009 06:11 PM
There might be a clue in ". . . not sure if bonded across 2 or 4 switches is such a good idea. "
If L2 flows can happen such that switch doesn't see flow in both directions, switch will unicast flood.
One possible solution, might be to bond to the same physical or logical switch but on different hardware ports. I.e., different cards on a chassis switch or different switch members on a stackable switch (e.g. 3750, 2975).
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide