Regular TCP stream making 6506E act as a HUB for the specific VLAN?

Skjalg Eggen · ‎10-11-2011

I'm having some weird issues with my 6506E Core switches. When perform backup of a Windows filecluster the return TCP stream to the backupserver gets "broadcasted" on all uplink trunks carrying that particular VLAN.

I put a Sniffer on the other of my 6500 pair and configured a SPAN session with VLAN for these hosts as source. Note that unicast traffic should never hit this core since none of the servers in question is connected to this 6500.

Started the sniffer and what I get is regular a regular TCP stream from filecluster to my backup server with ACK bit set for all packets. This makes no sense at all. I suspected that they had some MS NLB or some other MS multicast service configured, but the traffic I see on my sniffer, that gets broadcasted, looks remarkably unicast to me.

Most packets looks like this:

32995 13838.400443 10.55.14.111 10.55.14.220 TCP 1514 53895 > 9601 [ACK] Seq=11169621 Ack=1 Win=32678 Len=1460

Frame 32995: 1514 bytes on wire (12112 bits), 1514 bytes captured (12112 bits)

Ethernet II, Src: HewlettP_2a:6c:88 (00:1b:78:2a:6c:88), Dst: HewlettP_1d:fd:80 (00:26:55:1d:fd:80)

Internet Protocol Version 4, Src: 10.55.14.111 (10.55.14.111), Dst: 10.55.14.220 (10.55.14.220)

Transmission Control Protocol, Src Port: 53895 (53895), Dst Port: 9601 (9601), Seq: 11169621, Ack: 1, Len: 1460

Data (1460 bytes)

This issue does affect my 2960S switches as well, but it does not affect my 4500 switches??..

Im at a loss here guys.

Anyone got some hints as to where I should start looking?

This is jamming up all my trunks with 5-600 Mb/s which affects all my VMware hosts and VM's and degrades the performance of the entire Core network.

Message was edited by: Skjalg Eggen

andrew.prince · ‎10-11-2011

Is the svi for the vlan on the 6506 the default gateway for the vlan?

Sent from Cisco Technical Support iPad App

Skjalg Eggen · ‎10-11-2011

Default gateway is a ASA5520 HA pair. no SVI for this perticular VLAN

Joseph W. Doherty · ‎10-11-2011

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Might be unicast flooding. See http://www.cisco.com/en/US/products/hw/switches/ps700/products_tech_note09186a00801d0808.shtml

Skjalg Eggen · ‎10-11-2011

Thanx for your reply

Unicast flooding seems to fit the symptoms.

Problem is it cant be asymetric routing since the two hosts are on the same VLAN and subnet.

It cant be STP issues eighter, 2 days since the latest topology change for this VLAN. Should be longer, I'll have to look into that allso.

That leaves us with Cause 3: Forwarding Table Overflow But I only see packets with the same source MAC address.

This would likely create havoc for the 14 webservers in the same VLAN as well, but that's not the case.

What else can cause unicast flooding.

It would appear as though the 6500 switches does not have an entry for the MAC of the backupserver in their CAM. but the 4500 switches does because the flooding stops on the 4500 switches, where the backupserver is connected.

Skjalg Eggen · ‎10-11-2011

But why does it only flood the traffic from the MS filecluster?

We backup 20 more servers in this network without the flooding occuring.

This leeds me to think that there must be someting with the combination of the two hosts.

dominic.caron · ‎10-11-2011

Hi,

Any server on backup server connected to the network by two nic on the same vlan...

You could get this issue if NIC A and NIC B are in the same server and VLAN but with different IP. Lets say NIC A is the default route, if another server wants to connect to IP B on NIC B, it will send a arp and since its an L2 request, answer will come out of NIC B. All future packet will exit the server using NIC A because it's the default route. After the CAM timer expire, you will have unicast floding.

Skjalg Eggen · ‎10-11-2011

Backup server has only one NIC connected.

The MS Fliecluster has two NIC's configured in Team. The other NIC in the Team is connected through another C3020 blade switch connected to the C4500 where the Backupserver is allso connected.

Teams are configured I dont know the configuration but I suspect it to be configured as active/passive and the active NIC is in the C3020 connected to the 6500.

On a side note, the flooding seems to be consistant in a 10 minute intervall. 10 minutes of flooding then 10 minutes of no flooding then 10 minutes of flooding again.

CAM timings are default.

C6506-Site2-Core1#show mac-address-table aging-time vlan 50

Vlan Aging Time

---- ----------

Global 300

C6506-Site1-Core1#show mac-address-table aging-time vlan 50

Vlan Aging Time

---- ----------

Global 300

C4500-Site1-Core2#show mac-address-table aging-time vlan 50

Vlan Aging Time Configured Aging Time

---- ---------- ---------------------

Global Vlan Admin Age: 300

50 300 300

dominic.caron · ‎10-11-2011

Since the filecluster are in teaming, try shuting one of the NIC on your switch to see if the problem is still there.

Skjalg Eggen · ‎10-11-2011

I think I'm on to something. The file cluster has 4 NIC's in 2 teams. What I notice is that the Team with the 10.55.14.111 address associated to it has a diferent MAC than the source MAC i see in wireshark.

The source MAC I see in wireshark is actually a microsoft cluster NIC with a self assigned IP. This NIC is nowhere to be found int the manage network interface folder on the server. Allso the HP Teams are in the wrong order. Data Team should come befor the Cluster Team.

I'll have to check with the server NIC guru's tomorrow and make them change the order of the NIC's so that the Source MAC is the MAC of the DATA NIC Team.

Skjalg Eggen · ‎10-12-2011

Finished the meeting with the server guys, and they are going to do the changes as soon as they get the goahead from the service owner.

Allso noticed that the Teams are auto configured and are reporting 2 Gb/s linespeed. I dont know how HP does the Teaming but if it does loadbalancing and not active passive, this could allso be the cause since we then would have asymetric traffic.

Hopefully the changes will mitigate the issues we are having.